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FIELD OF THE INVENTION 

This invention relates to the field of knowledge systems. More specifically, this 
5 invention relates to the application of knowledge systems to machine translation, to 
natural language processing, and to artificial intelligence systems. 



BACKGROUND 



10 I. INTRODUCTION 

For several decades, researchers in various areas of computer science have 
attempted to develop methods to enable machines to understand the natural language 
spoken and written by human beings (e.g., English, Chinese, Arabic) in a scalable, 
automated fashion. While computers can perform specific tasks for which they've been 

15 programmed, the state of the art does not provide a method or system for automated 
general understanding of the meaning of words and phrases in context. 

Many applications, including machine translation (or MT) of human languages, 
voice recognition technology, search, retrieval and text mining systems, and artificial 
intelligence applications, require automated understanding of natural language in order to 

20 be fully effective. The obvious benefits of such applications, if broadly enabled, have 

motivated universities, governments and corporations to invest many decades of time and 
collectively billions of dollars of capital looking for a method that would enable 
computers to process and understand written or spoken natural language. Given the 
significant effort in these fields without a breakthrough, many in the scientific 
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community question whether true machine understanding of natural language is possible. 
Even most advocates of the idea that computers will one day be capable of wide-ranging 
human-type understanding see that time as still decades away. 



5 II. STATE OF THE ART OF MACHINE TRANSLATION 

Most language translation to date is performed by skilled and expensive human 
translators. Automating the language translation process would have major economic 
benefits ranging from significant cost reduction of translation to enabling new time- 
sensitive translation applications like on-the-fly cross-language text or voice 

1 0 communications and multilingual daily news publications. 

Machine translation devices and methods for automatically translating documents 
from one language to another are known in the art. However, these devices and methods 
often fail to accurately translate sentences from one language to another and therefore 
require human beings to substantially edit the many errors made by the devices before 

15 output translations can be used for most applications. The current state of the art systems 
accurately resolve 60% to 80% of the words they translate among the Latin languages, 
but the percentage of publishable quality sentences translated by these systems in a broad 
domain is typically less than 40%. The accuracy of existing machine translation systems 
for non-Latin based languages is even lower. The only exceptions are narrowly 

20 customized special purpose machine translation systems that do not generalize across 
application domains. Moreover, most commercially deployed machine translation 
systems require man-decades of development for each direction of each language pair. 
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Achieving accurate machine translation is more complicated than providing 
devices and methods that make word-for-word translations of documents. Because each 
word's meaning is highly dependent on the context it is found in, simple word-for-word 
translation of sentences results in wrong word choices, incorrect word order, and 
5 incoherent grammatical units. 

To overcome these deficiencies, known translation devices have been designed to 
attempt to make choices of word translations within the context of a sentence based on a 
combination or set of lexical, morphological, syntactic and semantic rules. These 
systems, which have been developed for over 40 years and are known in the art as "Rule- 
10 Based" machine translation (Rule-Based MT) systems, are flawed because there are so 
many exceptions to the rules that they cannot provide consistently accurate translation. 
The most prominent company providing machine translation based primarily on the Rule- 
Based method is Systran, which began the development of their machine translation 
engines in the 1960s. Rule sets are laboriously handcrafted and always incomplete, as it is 
15 extremely difficult if not impossible for human developers to encompass all the nuances 
of language in a finite set of rules. 

In addition to Rule-Based MT, in the last two decades a new method for machine 
translation known as "Example-Based" machine translation (EBMT) has been developed. 
EBMT makes use of sentences (or possibly portions of sentences) stored in two different 
20 languages in a cross-language database. When a translation query in the Source 

Language matches a sentence in the database, the translation of the sentence in the Target 
Language is produced by the database, providing an accurate translation in the Target 
Language. If a portion of a translation query in the Source Language matches a portion 
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of a sentence in the database, these devices attempt to accurately determine which portion 
of the Target sentence (that is mapped to the Source Language sentence) is the translation 
of the query. "Source" refers to the content in one language or state that is being 
translated into another language or state; "Target" refers to content in a language or state 
5 that the Source is being translated into. 

EBMT systems known in the art cannot provide accurate translation of a language 
broadly because the databases of potentially infinite cross-language sentences will always 
be predominantly "incomplete." And since EBMT systems do not reliably translate 
partial matches and sometimes incorrectly combine correctly translated portions, the 

10 accuracy of these systems is in the same approximate vicinity as the Rule-Based engines. 

Another machine translation approach that is often used independently, as well as 
in conjunction with EBMT, is Statistical Machine Translation (SMT). SMT systems 
attempt to automate the translation process using pairs of translated documents in 
combination with a large corpus of documents in just the Target Language. Compared to 

15 Rule-Based MT, both EBMT and SMT significantly reduce the time to develop a 

translation engine for a pair of languages. The accuracy of SMT systems is comparable to 
Rule-Based MT and EBMT systems and is, therefore, not adequate for the production of 
translated documents in a broad domain. 

SMT systems use what is known in the art as an "n-gram model" and are based on 

20 Shannon's "noisy channel model" for information transfer. These methods assume 

translation to be imperfect, and by design, SMT methods produce translations based on 
their probability of being correct based on the training corpora. These methods take a 
"best guess" at translations for each word based on the two, or at most three, other 
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adjacent words in the Source and Target Languages. These methods gain less marginal 
benefit with increases in the size of the cross-language and Target Language training 
corpora, and have continued to make only incremental improvements over the last several 
years. For example, one of the higher quality SMT systems developed over the past 
5 years at the University of Southern California recently published the results of a test of 
their SMT system. After training on the domain-specific corpus (the Canadian 
Legislature proceedings), their system translated 40% of the text sentences correctly 
(AMTA 2002 Proceedings, Oct. 2002). 

Some translation devices combine Rule-Based MT, SMT and/or EBMT engines 
1 0 (called Multi-Engine Machine Translation or MEMT). Although these hybrid 

approaches may yield a higher rate of accuracy than any system alone, the results remain 
inadequate for use without significant human intervention and editing. 

III. STATE OF THE ART OF STATISTICAL NATURAL LANGUAGE 
1 5 PROCESSING FOR SEMANTIC ACQUISITION 

The field of statistical natural language processing (NLP) includes the research 

and development of automated machine learning from text for various applications. One 

application of NLP is SMT for machine translation, as discussed above. Although various 

20 NLP methods attempt to extract the meaning from natural language, as a leading textbook 
on the subject makes clear, the state of the art is far from a solution: "The holy grail of 
lexical acquisition is the acquisition of meaning. There are many tasks (like text 
understanding and information retrieval) for which Statistical NLP could make a big 
difference if we could automatically acquire meaning. Unfortunately, how to represent 

25 meaning in a way that can be operationally used by an automatic system is a largely 



unsolved problem." (Manning and Schutze, Foundations of Statistical Natural Language 
Processing, 5 th printing, 2002, p 312). 

There is a great need for organizations to better manage the knowledge they've 
captured in unstructured text such as word-processed documents, PDF files, email 
5 messages and the like. Although information previously assembled in databases can be 
searched and retrieved effectively, a practice referred to in the art as data mining, the 
broad mining of unstructured text (representing 80% or more of the world's data) to look 
for ideas and concepts is not currently possible using the state of the art systems. While 
Boolean and other keyword search methods find information using the words contained 

10 in the user's query, most ideas and concepts can be expressed in a large number of 

different ways, many of which will not exactly or even approximately contain a particular 
keyword or other search term. This means many relevant documents that will be 
identified when conducting a "concept-based" search (which is not limited to the query 
words the user provides) will be missed when a keyword search is conducted. 

15 For instance, if the word string "terms and conditions" was submitted in quotes 

(indicating the exact string) as part of a keyword search, the system will find references 
to "terms and conditions" but not identify other words and word strings (a word string is 
two or more adjacent words in a specific order) or other abbreviations or representations 
expressing the same idea that may be of interest to the user, such as "conditions of use", 

20 "restrictions", "tos", "terms of service", and "rules and regulations". The ability for a 
system to add close semantic equivalents to the search query when looking for relevant 
information would enhance the quality and efficiency of search in a variety of ways. 
Moreover, there are no comprehensive phrasal level synonymy or near-synonymy 
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dictionaries. They simply do not exist because there are too many two- and three-word 
terms to manually create synonym lists for each, let alone all the terms that are longer 
than three words. Existing methods to automatically generate thesauri using patterns in 
text have had limited success in the broad semantic acquisition of natural language. The 
5 state of the art methods for concept extraction using patterns of words that occur in text 
include similarity assessment methods such as vector space models using various 
measures. Some of these methods attempt to find synonymous or related words by 
identifying individual words as points of context. 

Some methods consider words that are different distances from a query and 

10 focus on the proximity and frequency of co-occurrence of individual words in relation to 
the query. These methods include an n-gram based method (Martin, Ney: Algorithms for 
Bigram and Trigram Word Clustering, Speech Communication 24, pp 19-37, 1998; 
Brown et al: Class-Based N-gram Models of Natural Language, Computational 
Linguistics, 18(4), pp. 467-479, 1992; and the Window-based Method (Brown et al)). 

15 Other related work in this area includes: Finch & Carter (1992, Bootstrapping Syntactic 
Categories Using Statistical Methods); Schutze & Pederson (1997, A Co-Occurrence- 
Based Thesaurus and Two Applications to Information Retrieval), among many others. 
While the contextual information has provided some results, the breath and accuracy of 
the results achieved using these methods has been limited and, therefore, they've had 

20 limited practical application in commercial products for search and retrieval, content 
management, and knowledge management. 

Most advanced search and text mining applications use manually assembled 
linguistic rules, semantic knowledge, and ontologies and taxonomies. These methods and 
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systems can be used to provide semantic clues for meta-tagging data by category as well 
as other purposes. In addition, some systems incorporate various supervised and 
unsupervised statistical learning and extraction methods including Bayesian methods 
assessing relevance probabilities to add to the analysis for search and/or categorization. 
5 These systems do not effectively mine text because the methods do not yield consistently 
accurate (i.e., relevant) search results. Additionally, because meta-tagging involves the 
pre-defining of information into categories to be used as part of enhanced search, the 
category determination requires that static labels be put on multi-dimensional ideas (that 
may also evolve or change categories over time). None of these systems are designed to 

10 mine information to find other words and phrases of equivalent meaning to query terms. 
The ability of a system to identify semantically equivalent alternative 
representations of a word or word string within a language has many applications. The 
ability to generate synonymous expressions for any expression, in addition to text mining, 
is also a very effective component of any corpus-based machine translation system. In 

15 addition, the ability to identify expressions of equivalent meaning is machine 

understanding of natural language, and this ability could provide the foundation for 
artificial intelligence (AI) applications. 

IV. STATE OF THE ART OF ARTIFICIAL INTELLIGENCE 

20 The most ambitious goal of machine understanding of human language is for use 

in a system that achieves full-scale human quality intelligence, i.e., a system that is 
capable of reasoning rationally and exhibiting human-type common sense. This field of 
computing, referred to as "Strong AI," has as its ultimate goal to enable computers to 
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understand natural language, interact with people or other computers using natural 
language, learn concepts, make insights, and perform cognitive tasks. While a machine 
translation system has the task of understanding information only to the level necessary 
for the purpose of converting the information into another form, Strong AI applications 
5 need the capability to not only understand information and its other forms and states, but 
also to manipulate that information in a way that triggers the system to learn to answer 
questions and perform other cognitive tasks, such as draw conclusions from premises, 
discover relations from observations, and set sub-goals to pursue further knowledge 
gathering in anticipation of expected future needs. 

10 The mathematician Alan Turing devised the Turing Test in 1939 as a conceptual 

design for testing whether a machine achieved human quality intelligence. Although a 
machine that passed the Turing Test would not necessarily completely fulfill the promise 
of all the ambitions of Strong AI, even the most optimistic proponents of Strong AI feel 
that a computer will not convincingly pass the Turing Test for decades. 

15 AI methods known in the art vary in approach. The vast majority of commercial 

AI applications address far more narrow tasks than the goals of Strong AI. These 
applications are sometimes referred to as "Weak AI" and produce at best "idiot-savant"- 
type systems capable only in the confines of a narrow task such as playing master-level 
chess. Various methods used to produce these systems include manually encoding 

20 knowledge and rules, and systems that can learn how to generalize certain encoded 

knowledge to perform narrowly defined tasks. Other methods like neural nets have been 
developed to train systems to learn, again in very narrowly defined domains. In the 
absence of a true breakthrough that enables broad machine understanding of natural 
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human languages, the focus on narrow problems enables practical applications for 
specific tasks. 

There have been relatively few Strong AI software initiatives. Typically Strong 
AI systems known in the art manually encode knowledge using a specific computer 
5 language designed for that purpose and then employ a system to manipulate that 

knowledge in the aggregate to attempt to answer questions or perform tasks. The most 
prominent example of a Strong AI system using a manually created ontology of encoded 
knowledge is the Cyc system developed at CycCorp by computer scientist Doug Lenat. 
The Cyc system requires human beings to manually encode a vast amount of common 

10 sense knowledge as well as domain-specific knowledge (and understand the different 
representations of that knowledge), which are "rules" for the system to follow. An 
example of a hand encoded rule or piece of knowledge for Cyc might be "once people die 
they stop buying things" or "trees are usually outside." Cyc has been in development 
since 1984 without producing a system with wide ranging human intelligence. To date, 

15 they have hand encoded fewer than 2 million of these very specific rules. 

An enabling breakthrough in Strong AI would have far reaching implications. 
The evolution of technological advancement would increase dramatically as scalable 
computer processing and memory, armed with human quality intellect, is focused on the 
issues and problems we all face. A fundamental breakthrough in Strong AI could 

20 literally change the world as we know it. 
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SUMMARY OF INVENTION 



I. INTRODUCTION 

The present invention provides a method and apparatus for automating the 
5 acquisition, reconstruction, and generation of knowledgebases of associated ideas and 
using such knowledgebases in many applications including machine translation of human 
languages, search and retrieval of unstructured text (or other data) based on concept 
search (not keywords), voice recognition, data compression, and artificial intelligence 
systems. In the present invention, knowledgebases of associated ideas are created by 

1 0 studying the relationships between ideas as they recur in an unstructured body of 

information. The expression of ideas may be, but need not be, similar in number, length, 
or size; and they may be expressed or represented in any medium (e.g., text, visual 
images, sounds, infrared waves, smells, symbols). 

The present invention also provides a method and apparatus for creating and 

15 utilizing knowledgebases to convert ideas from one state into other states, and to 
otherwise manipulate the knowledgebases for practical applications. 

In one embodiment of the present invention, the knowledgebases created are 
reconstructed in limitless derivations to be used for human language translation 
applications. Another embodiment of the present invention may be used to create a 

20 knowledgebase of associations between ideas to establish their relationship to one 
another. These associations/relationships of ideas can be used as trigger events for 
artificial intelligence applications when two or more types of ideas appear together in 
certain patterns. 
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The basic aspects of the present invention are knowledgebase acquisition, 
knowledgebase reconstruction, knowledgebase generation, and the use of 
knowledgebases to convert ideas and otherwise manipulate the knowledgebases for 
practical applications. The knowledgebase acquisition aspect of the present invention 
identifies ideas and their representations in different states. Thus, for applications that 
manipulate written text, the present invention identifies the meaning of word and word 
string units, including ideas in different languages that are translations of one another, 
and ideas that are synonymous expressions within a single language. The knowledge 
acquisition component of the present invention also identifies non-synonymous words 
and word strings that are nevertheless related semantically in some way (e.g., opposites, 
common class members, generally related ideas). 

The knowledge reconstruction aspect of the present invention pieces together the 
units of meaning learned through knowledge acquisition into limitless derivations of 
more complex ideas. This allows the knowledgebases of associated ideas to be used as 
building blocks to manipulate broad ranges of ideas in different states, or within one 
state. Thus the knowledgebases of associated ideas may be used to translate entire 
documents into a Target Language as well as represent complex ideas in different forms 
within a single language, thus enabling automated understanding for applications such as 
concept search, natural language interfaces, voice recognition, and the like. 

The knowledge generation aspect of the present invention uses recognized 
patterns of connected complex ideas to trigger the use of previously learned knowledge 
(or the learning of new knowledge) to perform a cognitive task. The present invention 
achieves these and other objectives by identifying multiple ways of expressing each 



15 



recurring idea and establishing the relationships between different ideas. Thus, in one 
embodiment of the present invention, the ideas are expressed in human language and the 
system makes associations by documenting the frequency and proximity relationship of 
two or more ideas and their co-occurrence in text. As stated before, the ideas are 
5 represented by word strings of any size. 



II. WORD STRINGS AS UNITS OF MEANING 

Unlike the existing state of the art of SMT systems, vector space measures for 
10 semantic similarity, and other NLP supervised or unsupervised learning, the present 
invention matches and/or associates patterns of recurring word strings of any size with 
other recurring word strings of any size. This technique of examining exact word strings 
including stop words (words such as "it", "an", "a", "of, "as", "in") as single units of 
meaning in unstructured text applies to all aspects of the present invention. By 
15 identifying and focusing on recurring words or word strings of any length as a single unit, 
the present invention captures the meaning of words in context. 

For example, the present invention treats "rock" as potentially representing a 
variety of meanings depending on context (e.g., a stone or a kind of music). When you 
look at word strings, further meanings become apparent: "a rock" could represent a stone 
20 or a solid individual in tough times; "a rock band" can represent a group of musicians that 
play rock music. Likewise, the contiguously appearing words "between a rock" take on 
different meanings depending on the larger word strings they appear in. If they exist in 
the word string "between a rock band's sets", the meaning is quite different than when 
they are found in "between a rock and a hard place". Furthermore, the expression 
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"between a rock and a hard place" taken as a whole has a meaning that would not easily 
be understood by analyzing its parts. 

The present invention's treatment of each recurring word string in language as a 
separate idea stands in stark contrast to existing automatic semantic acquisition methods 
for machine translation and machine understanding. In addition, the present invention's 
treatment of each recurring word string in language as a separate idea contrasts with 
modern linguistic theory, which focuses on the semantic value of individual words in the 
context of other individual words. The terms "co-location" and "idiom" in linguistic 
theory refer to the special cases where a word string is taken as a whole because the 
multi-word expression has taken on a meaning that can not be easily discerned by looking 
at the component words. In effect, the component words have lost their individual 
semantic value and only relate to the idea expressed when taken as part of the whole. 

For instance, a term like "pitch black" is an example of a co-location and 
"between a rock and a hard place" is an example of an idiom. In contrast, the present 
invention treats not just all words, co-locations, and idioms as atomic units of meaning, 
but rather it treats all word strings as potential atomic units of meaning. The present 
invention allows words within a word string to maintain their core semantic value, 
change their core semantic value in subtle ways, or completely diverge from their typical 
meaning, depending on the exact string of words they are found in. 

For example, "baseball" is a kind of game, "a baseball" is a round object, "a 
baseball team" is a sports franchise, and "a baseball player" is a person. The present 
invention manipulates these different word strings involving a common word (baseball) 
individually as independent ideas when manipulating units of meaning in applications 
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requiring machine understanding of natural language. While the present invention does 
not use linguistic rules for grammar and does not label word strings by their parts of 
speech, the methods of the present invention allow the context of the word string to be 
manipulated as a unit and preserve its linguistic qualities. 

5 

III. METHODS AND SYSTEMS FOR LANGUAGE TRANSLATION AND 
NATURAL LANGUAGE UNDERSTANDING FOR TEXT MINING, 
NATURAL LANGUAGE INTERFACE AND OTHER APPLICATIONS 

10 A. Overview 

The present invention provides several methods and apparatuses for creating and 
supplementing cross-language association databases (i.e., knowledgebases) of ideas. 
These databases generally associate data in a first form or state that represents a particular 
idea or piece of information with data in a second form or state that represents the same 

15 idea or piece of information. These databases are then used, for example, to facilitate the 
efficient translation from one state to another of documents containing these ideas using 
the knowledge reconstruction method of the present invention referred to as the dual- 
anchor overlap. 

One method for building cross-language word string translation databases uses 
20 documents previously translated by human beings (Parallel Text) to recognize co- 
occurrence of word strings across the translated documents. A second method of the 
present invention for building cross-language word string translation databases deduces a 
word string translation between a language pair by using known word string translations 
from several other language pairs. Another method of the present invention uses a cross- 
25 language dictionary along with a large Target Language corpus and certain search 
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techniques to identify word string translations. Another method of the present invention, 
known as dual-anchor overlap, expands cross-language word string databases by 
automatically deducing new associations from previously learned associations (this is 
also the knowledge reconstruction aspect of the present invention). 
5 Another method and system for the knowledge acquisition aspect of the present 

invention creates knowledgebases of related ideas in a single language or state by 
examining multiple occurrences of an idea expressed in that one language or state. For 
example, in the present invention it is possible to create a knowledgebase of associated 
ideas in English by examining the recurrence of ideas represented by words and word 

10 strings in different documents in English. The present invention performs knowledge 
acquisition on an idea expressed (by a word or word string) in a single language by 
examining the co-occurrence of surrounding ideas (represented by contiguous words or 
word strings) and then identifying other words and/or word strings in the same language 
that have similar patterns, thus enabling the system to identify words and word strings 

1 5 that are semantically equivalent to (or have some other semantic relationship to) the 

original (query) word or word string. Knowledge acquisition in a single state or language 
uses one embodiment of the present invention's method for performing Common 
Frequency Analysis. In general, Common Frequency Analysis is the method of the 
present invention that associates two or more words and/or word strings with one another 

20 and other third words and word strings. 

The knowledge reconstruction aspect of the present invention that connects 
contiguous data segments, represented by word strings in this embodiment, is the dual- 
anchor overlap technique. This aspect of the invention assembles contiguous word 
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strings by connecting only word strings that have overlapping words (or word strings) 
with those word strings both to the left and right of it. The system can use the dual- 
anchor overlap to connect contiguous known building block word strings in combinations 
not yet encountered by the system to generate new complex ideas or represent known 
ideas in new forms. The dual-anchor overlap technique of the present invention is used 
to connect ideas represented by word strings (or other data segments) in order to translate 
documents across two languages as well as to connect contiguous concepts within a 
single language. 

The knowledge generation aspect of the present invention allows a user to set 
triggers for next steps based on the co-occurrence of associated third word strings shared 
by two different word strings found within general proximity of each other (Common 
Frequency Analysis). This knowledge generation aspect will enable Strong AI 
applications. The system uses CFA to trigger next-step CFAs in a chain of logic designed 
by the user to solve a general class of problems. The system will analyze a question or 
statement by parsing it into all possible sets of known word strings. The system will then 
analyze the different potential combinations of word strings to identify a known pattern 
(i.e., two or more words and/or word strings expressed together in a certain order) that 
will trigger the next step(s) in the analysis. 

B. Methods and Systems 

In the field of machine translation, the system uses any of the several methods for 
cross-language knowledge acquisition of word string translations, and combines those 
translations using the knowledge reconstruction method. This significantly improves 
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upon the quality of existing translation technology and systems and represents advances 
on the present state of the art. 

One method for cross-language knowledge acquisition can occur by use of 
documents in two or more languages. The documents can be exact translations of each 
5 other, i.e., "Parallel Text" documents, or can be text in two languages concerning the 
same subject matter, i.e., "Comparable Text" documents. This acquisition can occur 
directly between the Source and the Target Languages (with Parallel or Comparable 
Text). As used for language translation, the system automatically builds a cross-language 
database of semantically equivalent ideas (expressed in words or word strings) across two 
10 languages. 

One embodiment of this method and system of the present invention selects at 
least a first and a second occurrence of all words and word strings that have a plurality of 
occurrences in the first language (Source Language) in the available cross-language 
documents. It then selects a first word range and a second word range in the second 

1 5 language (Target Language) documents, wherein these Target Language ranges 

approximately correspond to the locations of the first and second occurrences of the 
selected word or word string in the Source Language documents (and hence provide a 
high probability of containing the translation of the Source words or word strings). Next, 
looking at just the ranges in the Target Language, the system compares words and word 

20 strings found in the first word range with words and word strings found in the second 
word range (along with all other Target word ranges that correspond to additional 
occurrences of each word or word string in the Source Language) and, locating words 
and word strings common to different word ranges, stores those located common words 
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and word strings in the cross-idea database. The invention then associates, in the cross- 
idea database, the common words and word strings located in the ranges in the Target 
Language with the selected word or word string in the Source Language, ranked by their 
association frequency (number of recurrences), after adjusting the association frequencies 
5 as detailed in Figure 1 . By identifying the co-occurrences of words and word strings 
across languages in Parallel or Comparable Texts, the system will identify more 
associations as more Parallel or Comparable Text becomes available. 

Once associations are made based on frequency of words and word strings in the 
Target Language ranges, those potential Target Language word string translations can be 
10 further verified by finding ranges corresponding to them back in the Source Language 
documents. The system can then find the most frequent words and word strings within 
the Source Language ranges to see if the original selected word or word string is among 
the most frequent Source Language words and word strings resulting from this reverse 
learning process. 

15 By automatically building translations between frequently recurring word strings 

(without regard to the size of a word string) in Parallel Text, the present invention 
captures translations with the necessary built-in context for each word in the string. 
These accurate translations of word strings with built-in context provide the building 
blocks that can be used in different appropriate combinations (using the knowledge 

20 reconstruction aspect of the present invention) to translate documents. As the system 
learns word string translations, they will be stored in a data repository for much faster 
translation when they're needed again for the future translation of documents. The 
system can operate on documents to learn recurring word strings as they occur 
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sequentially in examined Parallel Text, or recurring word strings can be learned based on 
specific Parallel documents entered into the system that have been selected because they 
contain words in the Source Language that need to be translated into the Target 
Language. The latter operation is a form of "learning by doing" and is an example of 
5 learning on-the-fly. 

The present invention also provides a cross-language knowledge acquisition 
method and apparatus that uses databases automatically built by the present invention in 
different languages together in the aggregate to deduce word string translations between 
two languages not yet learned directly through Parallel Text. This multilingual leverage 
10 technique of the present invention uses the common results that are generated indirectly 
by translating from the Source Language into known word string translations in 
intermediate languages, and then from the intermediate languages into the Target 
Language. 

This same multilingual leverage technique for cross-language knowledge 
15 acquisition using translations through intermediate third languages and then into the 
Target Language can also be employed using any state of the art machine translation 
system between these languages. Even though the accuracy levels of these systems is 
low when used individually and fewer common results will be reached in the Target 
Language through intermediate third languages, when several results are identical, the 
20 translation will have a high degree of confirmed accuracy. Moreover, these results can be 
confirmed by requiring contiguous word string translations to have large overlaps (e.g., 
two, three, or four-word word string overlap on each side) in the Target Language using 
the dual-anchor overlap process before being approved. 
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The next method of the present invention for cross-language knowledge 
acquisition builds associations between word strings of different languages using a 
monolingual corpus in the Target Language and/or Parallel Text, along with any one or 
more of the following: machine translation systems known in the art, cross-language 
dictionaries known in the art, and/or custom-built cross-language dictionaries. These 
methods of the present invention use a technique called "Flooding" whereby all available 
translations for each word in a Source Language word string (Target translations may be 
words or phrases) are generated using custom-built dictionaries or systems known in the 
art (oftentimes producing multiple translation possibilities for each word, even if some or 
all of the translation possibilities don't apply in that particular context). Different 
combinations of these word-for-word (and/or word-for-phrase) translation possibilities 
are used to search Target Language documents (either a monolingual corpus or Parallel 
Text) to identify translation candidates for a Source Language word string. The process is 
called "Flooding" because Target Language documents are "flooded" with these word- 
for-word (and/or word-for-phrase) combinations. The Flooding method for word string 
translations requires more calculations than cross-language learning with Parallel Text 
but, because it does not require Parallel Text to build word string translations, it provides 
more translation coverage of language. 

In addition to the acquisition of knowledgebases, the dual-anchor overlap 
technique of the present invention reconstructs larger ideas using the entries of the 
knowledgebase (i.e., pieces together smaller units into coherent larger units). Thus, the 
present invention also provides a method and apparatus for converting an entire 
document from one language or state to another language or state using the building 
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block ideas expressed in different word strings across two languages. The present 
invention is either provided with or builds a database comprised of data segments in a 
Source Language associated with data segments in a Target Language. The present 
invention translates text by using the cross-language word string translation database and 
5 only approving translations of word strings that have an overlapping word or word string 
on both sides (unless it is a first or last word string in the translated segment) in both the 
Source and Target Languages. 

In a preferred embodiment, the present invention translates text by accessing the 
above-referenced database, and identifying the longest word string in the database that is 

1 0 also in the document to be translated (measured by number of words) beginning with the 
first word of the document. The system then retrieves from the database a word string in 
the Target Language associated with the located word string from the Source Language. 
The system then selects a second word string (from the document to be translated) that 
exists in the database and has an overlapping word or word string with the previously 

15 identified word string in the document, and retrieves from the database a word string in 
the Target Language associated with the second word string in the Source Language. If 
the word string associations in the Target Language have an overlapping word or word 
string, the word string associations in the Target Language are combined (eliminating 
redundancies in the overlap) to form a translation; if not, other Target Language 

20 associations to the Source Language word string are retrieved from the database (or 
learned on-the-fly) and tested for combination through an overlap of words until 
successful. Obviously, if overlapping word string translations in the Target Language 
cannot be identified or learned, other (shorter or longer) alternative overlapping word 
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strings in the Source Language can be used and their respective Target Language 
associations tested for overlap until successful. The next word string in the document in 
the Source Language is selected by finding the longest word string in the database that 
has an overlapping word or word string with the previously identified Source Language 
word string, and the above process continues until the entire Source Language document 
is translated into a Target Language document. Only word strings with an overlapping 
word or words with contiguous word strings on both left and right sides in both the 
Source and Target Languages are approved as a combined set of ideas for translation. 
The beginning and the end of the chain of overlapping word string translations can be 
defined by the beginning and end of a sentence, or by any other identifiable unit of text 
(e.g., phrase, title, paragraph, article, chapter, etc.). 

The above described cross-language dual-anchor overlap method and process 
increases the likelihood of combining each word string translation with an appropriate 
contiguous word string in terms of context and grammar. The number of overlapping 
words required to approve a connection of contiguous segments is user-defined. The 
higher the user-defined minimum number of overlapping words between contiguous 
segments required to approve the combination of word strings, the more accurate the 
results. The cross-language dual-anchor overlap technique resolves the issue of 
"boundary friction" confronted by existing EBMT systems and increases the likelihood 
of the correct context being used throughout a translation. 

Additionally, word string translations that are candidates based on cross-language 
learning (or other knowledge acquisition methods) but not yet confirmed by user-defined 
statistical significance, can be approved by requiring more overlapping words between 
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two contiguous word strings as a user-defined requirement. Longer unconfirmed word 
string translation candidates can also be tested for a cross-language overlap by using a 
smaller subset word string (i.e., internal word string) that has a known translation to 
confirm the middle un-overlapped portion of a longer word string. Note that the 
translation method is not limited to word strings of equal length or word strings in the 
same position in both the Source and Target Language sentences and is, therefore, very 
flexible. 

The present invention also provides a general method and apparatus referred to as 
Frequency Association Database creation to create frequency tables of proximity 
relationships between words and/or word strings in a single language. These proximity 
relationships are then used to make associations between a word or word string and other 
words and/or word strings based on common associations within a single language 
through the present invention's Common Frequency Analysis. The method of the present 
invention for knowledge acquisition within a single language uses the context 
(represented by words and word strings) surrounding each recurring idea (which are also 
represented by words or word strings). Semantic relationships can be identified and 
utilized to significantly improve search and text mining applications, machine translation 
and artificial intelligence applications. 

The present invention allows the acquisition of knowledgebases within a single 
state, such as a single language, using the Common Frequency Analysis method of the 
present invention. In one embodiment using Common Frequency Analysis, the system 
identifies words and word strings that represent synonymous ideas, as well as other types 
of relationships between ideas. 
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For example, by examining texts in the English language, associations can be 
established for words or word strings that identify semantically equivalent (i.e., 
synonymous) words and word strings (e.g., "nation's largest" and "biggest in the 
country"). The present invention also provides a method and apparatus to analyze a word 
or word string for word and word string associations and to produce words and word 
strings representing opposite ideas (where they exist), as well as words and word strings 
representing definitions, examples, and other related ideas including members of a 
common general class of ideas (e.g., "red" relates to "blue" and "lime green" as members 
of the class of colors), and other related information (e.g., the query "Mount Everest" 
may return "highest point in the world"). 

The present invention identifies these relationships between and among words 
and/or word strings by identifying the word strings of any size that are contiguous to the 
word or word string being analyzed, and whether these contiguous word strings are to the 
left or right of the analyzed word or word string. Words and word strings that share 
many of the same left and right contiguous word strings have strong semantic 
relationships with one another. Typically, the words and word strings that share the most 
number of different right and left context word strings, including longer (more words) 
right and left context word strings, are most semantically similar or otherwise 
semantically related. 

Knowledge acquired and assembled in a single language database (including 
knowledge generated on-the-fly) can be used to expand keyword search and text mining 
methods known in the art. These methods can be enhanced, for example, by searching 
semantic equivalents of keywords as well as other closely related words and word strings 
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to the entered keywords. The aspect of the present invention that identifies semantically 
equivalent terms by identifying common left and right context word strings can also be 
used to break semantic codes. If an otherwise inappropriate or unusual (in that context) 
word or word string is used as a code to represent a meaning other than its common 
5 meaning or meanings, its repeated use in an unusual context will allow the present 
invention to identify the true semantic meaning that underlies the semantic code. 

Appendix A (page 179) presents examples of association results using RCFA for 
a variety of queries. The first 15 examples show partial results for the queries (i.e., the top 
20-25 returns per query), while the final example (for the query "it is important to note") 

10 shows all 1000 returns. The results reflect a far more robust automated semantic 

acquisition method than any in the state of the art. The key to these results are treating 
word strings flowing into (i.e., to the left of the query, in English) and out of (i.e., to the 
right of the query, in English) the query idea as single units of context, and using that 
two-sided word string context to find other semantic units represented by words and word 

15 strings that share some of those same left and right side word string contexts. 

Using the dual-anchor overlap technique of the present invention, the same ideas 
represented by different word strings in the same language can also be substituted for one 
another in a chain of overlapping ideas to produce a plurality of sentences consisting of 
overlapping semantically equivalent ideas that combine to express the same larger idea. 

20 By providing a database of semantically equivalent ideas in a language along with the 
dual-anchor overlap technique of the present invention (described above for translation 
across languages), the present invention can reproduce the same larger idea in many 
different derivations. This dual-anchor overlap, the knowledge reconstruction component 
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of the present invention, will be very useful for voice recognition and other natural 
language recognition applications and provide expanded search combinations of the same 
idea expressed in various word string combinations. This ability will also provide very 
effective methods for text mining tasks such as entity and relation co-reference and 
5 tracking, among other tasks. 

The aspect of the single language knowledge acquisition methods of the present 
invention that generates semantic equivalents can also be used as a productive component 
in machine translation applications. A Source Language word string that cannot be 
translated because of a lack of information or for any other reason can be used to generate 
10 alternative Source Language word strings to be translated in its place. Additionally, 

semantic equivalents of word strings in the Source Language and/or semantic equivalents 
of a Target Language word string translation candidate can be used to help confirm 
correct translations. 

The present invention also provides a Common Frequency Analysis method and 
15 apparatus that uses relationships between recurring words and/or word strings in any 
number of ways in smart applications to answer questions by identifying associations to 
third words and/or word string that two or more words or word strings have in common, 
based on their proximity to one another in text. Databases created for smart applications 
can be built from documents in a single language (or alternatively using cross-language 
20 text). The presence of two or more words and/or word strings that are contiguous or 

overlapping (or possess some other close proximity relationship) in a question, request, or 
statement can trigger different types of Common Frequency Analysis of the present 
invention designed by the user or learned by the system. 
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The triggered Common Frequency Analyses will identify words and word strings 
not present in the question, request, or statement that share a proximity relationship in 
other available text with two or more words and/or word strings presented to the system 
in the question, request, or statement. These third word or word string associations 
5 common to both presented words and/or word strings may be used to identify the next 
steps in the chain of Common Frequency Analyses to understand questions or commands, 
and provide answers or perform tasks. 

The present invention provides a method for Strong AI tasks by providing a basis 
for dynamic, automatic knowledgebase creation by levels and categories of semantic 

10 association of any ideas expressed as words or word strings in context. Provided 

adequate training text is available, this ability provides a knowledgebase for all situations 
that can be leveraged by smart application triggers. 

In a sense, the user trains the present invention how to think about a class of 
situations represented by general patterns of ideas by building next step "triggers" for the 

15 system to use when certain known patterns of words and/or word strings are identified 
based on the semantic classes they are apart of (as identified by the present invention's 
Common Frequency Analysis for semantic equivalents and equivalence classes). By 
recognizing general classes of ideas through their particular identifying pattern of words 
and word strings (and/or known semantic equivalents), and by identifying the presence of 

20 a group of those ideas that fit a larger generalized pattern, the system can trigger 

strategies (once trained by the user to do so), executing logical next steps (knowledgebase 
lookups or next Common Frequency Analyses) when those general patterns are 
identified. Once the user creates enough "general strategy triggers," the system will learn 
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to identify appropriate triggers automatically for many other situations. These initial 
triggers set by the user can include triggers designed to teach the system to automatically 
set triggers for different purposes. 

Another object of the present invention is to associate sound wave 
5 frequencies produced by human speech and other sources to their corresponding ideas in 
each different language to be used in voice recognition and other applications that rely on 
interpretation of audible sounds. 

Another object of the present invention is to associate generalized patterns of 
pixel arrays and other methods for visual data representation with the corresponding ideas 
10 represented by different languages to be used in visual recognition for information 
gathering and artificial intelligence applications. 

Another aspect of the present invention is to represent semantically identical ideas 
using a single symbol or token like a number or a point on the electromagnetic spectrum 
which can be used as a data compression method. 

15 

IV. PRIOR ART 

Prior art systems do not accomplish what is described by the present invention. 
For example: 

USP 5,724,593 to Hargrave discloses a translation memory tool to assist human 
20 translators, where texts and corresponding translations are loaded into a memory. The 

texts in the Source Language are parsed into n-grams. The Source Language n-grams are 
analyzed to determine frequency of occurrence within texts of the Source Language and 
entropy weightings are assigned. N-grams having excessively high or low entropy 
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weightings are eliminated as being insufficiently useful for translation purposes. The 
remaining n-grams and corresponding translations are used in a reverse index for 
machine-assisted translation by finding "fuzzy matches" for input translations that exist 
in the translation memory for the human translators review. 

Hargrave does not perform word string association analysis using Parallel Text 
where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. Hargrave does not use translation of words and word strings indirectly 
through other third languages. 

Hargrave does not "Flood" Target Language text with Source Language word 
translations that make up Source Language word strings along with Source Language 
context words and word strings. Hargrave does not perform word and word string 
association analysis between words and word strings of a single language using word 
strings of any size to the left and right of the query. Hargrave does not require that a 
document input to be translated be parsed into overlapping word strings in the Source 
Language and require that Target Language translations of Source Language parsed word 
strings also have overlapping words or word strings with its neighboring translations to 
its left and/or right to approve a translation. 

USP 6,085,162 to Cherny discloses a three-dimensional topical database for 
translating between languages, where each layer of the database represents a user- 
selectable topic relevant to the translation. The database is built by parsing texts 
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representing at least two different language sources into words. In separate branches of a 
processing sequence, the parsed words from the two sources are assigned to different 
classes based in part on information such as their grammatical function, grammatical 
form and denotation. The input words in each branch are then translated using a dual- 
5 language dictionary to produce one or more translations, or associations, for each word. 
The word associations from each branch are processed together to produce forward and 
backward frequency of association using, for example, a neural network. The database 
used for translation is made up of layers, each representing a topic, each layer containing 
the frequency of association and assigned classes for all words within the topic. 

10 Cherny does not perform word string association analysis using Parallel Text 

where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 

15 Parallel Text. Cherny does not use translation of words and word strings indirectly 
through other third languages. Cherny does not "Flood" Target Language text with 
Source Language word translations that make up Source Language word strings along 
with Source Language context words and word strings. Cherny does not perform word 
and word string association analysis between words and word strings of a single language 

20 using word strings of any size to the left and right of the query. Cherny does not require 
that a document input to be translated be parsed into overlapping word strings and require 
that Target Language translations of Source Language parsed word strings also have 
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overlapping words or word strings with its neighboring translations to its left and/or right 
to approve a translation. 

USP 5,867,811 to O'Donoghue teaches the use of word pair frequencies to 
improve the quality of aligned corpora generated by other methods known in the art by 
modifying the aligned corpora to remove the most improbable corpora alignments. 
Aligned corpora are two or more bodies of text divided into aligned portions, such that 
each portion in a first language corpus is mapped onto a corresponding portion in a 
second language corpus. Each portion may comprise a single sentence or phrase, but can 
also comprise one word or perhaps a whole paragraph. Automated systems to produce 
aligned corpora known in the art are not always reliable. The invention employs a 
statistical database containing frequency tables for the occurrence of pairs of 
corresponding individual words across two languages to detect probable errors in aligned 
text portions. The invention also uses a statistical method to provide an alignment score 
for "chunks of words" by accumulating the individual word pair scores for all the word 
pairs in each pair of chunks. 

O'Donoghue does not perform word string association analysis using Parallel 
Text where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. O'Donoghue does not "Flood" Target Language text with Source 
Language word translations that make up Source Language word strings along with 
Source Language context words and word strings. O'Donoghue does not use translation 
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of words and word strings indirectly through other third languages. O'Donoghue does not 
perform word and word string association analysis between words and word strings of a 
single language using word strings of any size to the left and right of the query. 
O'Donoghue does not require that a document input to be translated be parsed into 
5 overlapping word strings and require that Target Language translations of Source 
Language parsed word strings also have overlapping words or word strings with its 
neighboring translations to its left and/or right to approve a translation. 

United States Patent No. 5,579,224 to Hirakawa teaches a system for creating a 
dictionary. A first language document and a second language document are loaded into 

10 memory. A word or character string is extracted from the first language document and 
corresponding words are selected from the second language document based on 
morphological and syntactic analysis of words in the second language document. 
Selected candidate words from the second language document are compared to the 
extracted word from the first language document by comparing words near the extracted 

15 word in the first document to words near the candidate selected words in the second 
language document. The candidate words are scored based on context and proximity. 

Hirakawa does not perform word string association analysis using Parallel Text 
where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 

20 documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. Hirakawa does not "Flood" Target Language text with Source Language 
word translations that make up Source Language word strings along with Source 
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Language context words and word strings. Hirakawa does not use translation of words 
and word strings indirectly through other third languages. Hirakawa does not perform 
word and word string association analysis between words and word strings of a single 
language using word strings of any size to the left and right of the query. Hirakawa does 
not require that a document input to be translated be parsed into overlapping word strings 
and require that Target Language translations of Source Language parsed word strings 
also have overlapping words or word strings with its neighboring translations to its left 
and/or right to approve a translation. 

United States Patent No. 5,991,710 to Papineni discloses a system for 
translating from a Source Language to a Target Language by statistically scoring Target 
candidate word sets in the Target Language and identifying candidate Target word sets 
with the highest score. The system uses a statistical model to choose the most probable 
translation among the Target Language candidates and is designed for applications where 
the domain is substantially restricted to a finite number of potential translations that will 
fit the input query. 

Panineni does not perform word string association analysis using Parallel Text 
where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. Panineni does not "Flood" Target Language text with Source Language 
word translations that make up Source Language word strings along with Source 
Language context words and word strings. Panineni does not use translation indirectly of 
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words and word strings through other third languages. Panineni does not perform word 
and word string association analysis between words and word strings of a single language 
using word strings of any size to the left and right of the query. Panineni does not require 
that a document input to be translated be parsed into overlapping word strings and require 
5 that Target Language translations of Source Language parsed word strings also have 
overlapping words or word strings with its neighboring translations to its left and/or right 
to approve a translation. 

United States Patent No, 6,092,034 to McCarley discloses a statistical 
translation system and method for fast sense disambiguation and translation using fertility 

10 models and sense models using the individual words of the Source Language. The 

fertility model is a language model for describing the probability of a fertility of a Source 
Language word, given the Source Language word and the context of the Source 
Language word using methods known in the art such as the maximum-entropy tri-gram 
model. The sense model is a language model for describing the probability of a Target 

15 Language word being the correct translation of a Source Language word, given the 

Source Language word and the context of the Source Language word using the tri-gram 
model and other methods known in the art. 

McCarley does not perform word string association analysis using Parallel Text 
where recurring word strings of any size in the Source Language documents are 

20 associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. McCarley does not "Flood" Target Language text with Source Language 
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word translations that make up Source Language word strings along with Source 
Language context words and word strings. McCarley does not use translation of words 
and word strings indirectly through other third languages. McCarley does not perform 
word and word string association analysis between words and word strings of a single 
language using word strings of any size to the left and right of the query. McCarley does 
not require that a document input to be translated be parsed into overlapping word strings 
and require that Target Language translations of Source Language parsed word strings 
also have overlapping words or word strings with its neighboring translations to its left 
and/or right to approve a translation. 

USP 6,393,389 to Chanod discloses a method for translating text by parsing the 
Source text into sub-segments. The sub-segments are then translated to a Target 
Language using any of a number of conventional means known in the art. Any sub- 
segment that has multiple translation choices, either because it was translated using a 
plurality of means or the method used to translate it provided multiple choices, has those 
choices ranked by a user-defined method. An attempt at conveying the meaning of the 
Source input in the Target Language is then made by presenting to the user a word string 
created by combining the highest ranking candidate for each segment consecutively. In 
alternative embodiments, the user may swap out segments for lower ranking segments or 
multiple choices for a segment can be displayed. 

Chanod does not perform word string association analysis using Parallel Text 
where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
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strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. Chanod does not "Flood" Target Language text with Source Language 
word translations that make up Source Language word strings along with Source 
Language context words and word strings. Chanod does not use translation of words and 
5 word strings indirectly through other third languages. Chanod does not perform word and 
word string association analysis between words and word strings of a single language 
using word strings of any size to the left and right of the query. Chanod does not require 
that a document input to be translated be parsed into overlapping word strings and require 
that Target Language translations of Source Language parsed word strings also have 

10 overlapping words or word strings with its neighboring translations to its left and/or right 
to approve a translation. 

USP 6,138,085 to Richardson discloses a system for determining, for a semantic 
relation that does not occur in a lexical knowledgebase, whether this semantic 
relationship should be inferred despite its absence from the lexical knowledge base. 

1 5 Richardson only seeks to define relationships between single words. The relationship 
between two presented words is placed into one of a limited number of manually defined 
categories (e.g., Synonym, Location, User, etc.) by finding one or more pathways 
between the words. The pathways are comprised of other words which are already 
connected in the database by manually tagged or deduced relationships. 

20 Richardson does not perform word string association analysis using Parallel Text 

where recurring word strings of any size in the Source Language documents are 
associated with recurring words and words strings of any size in the Target Language 
documents based on their frequency of appearance (after subtraction of larger word 
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strings from sub-strings) in the same approximate location of one another within the 
Parallel Text. Richardson does not "Flood" Target Language text with Source Language 
word translations that make up Source Language word strings along with Source 
Language context words and word strings. Richardson does not use translation of words 
and word strings indirectly through other third languages. Richardson does not perform 
word and word string association analysis between words and word strings of a single 
language using word strings of any size to the left and right of the query. Richardson 
does not require that a document input to be translated be parsed into overlapping word 
strings and require that Target Language translations of Source Language parsed word 
strings also have overlapping words or word strings with its neighboring translations to 
its left and/or right to approve a translation. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows an embodiment of a frequency association database according to 
the present invention. 

Figure 2 shows an embodiment of a computer system of the present invention for 
implementing the methods of the present invention. 

Figure 3 shows a memory device of the computer system of the present invention 
containing programs for implementing the methods of the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 



I. INTRODUCTION 

As indicated above, one aspect of the present invention is to provide several 
5 different methods and apparatuses for creating and supplementing knowledgebases 

(knowledge acquisition) and for manipulating content from a first state into a second state 
using the knowledgebases (knowledge reconstruction). "Documents" as discussed herein 
are collections of information and ideas that are represented by symbols and characters 
fixed in some medium. For example, the documents can be electronic documents stored 

10 on magnetic or optical media, or paper documents such as books. The symbols and 

characters contained in documents represent ideas and information expressed using one 
or more system of expression intended to be understood by users of the documents. The 
present invention manipulates documents in a first state, i.e., containing information 
expressed in one system of expression, to produce documents in a second state, i.e., 

1 5 containing substantially the same information expressed using a second system of 

expression. Thus, the present invention can manipulate or translate documents between 
systems of expression (for example, written and spoken languages such as English, 
Hebrew, and Cantonese, into other languages) in their respective encoding. In another 
aspect, the present invention can recognize different alternative representations of an idea 

20 or group of ideas within a single state or language, and automatically retrieve relevant 
associations, learned in the past or on-the-fly, when different groups of ideas are 
presented together (knowledge generation). 
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For all aspects of the present invention, a word string, as described above, is 
defined as a group (two or more) of adjacent words in exact order; a word, as referred to 
in this document, can appear independently or as part of a word string, and can include 
conventional words as would be found in a dictionary, conventional characters (e.g., 
5 Chinese characters) as would be found in a dictionary, or any other characters or symbols 
with recognizable semantic value in a language or culture, including abbreviations (e.g., 
"inc.", or "dept."), symbols (e.g., "©", or "MSFT"), acronyms (e.g., "ASAP", or 
"NCAA"), etc. and, depending on user-defined parameters, can include or not include 
punctuation and any other mark used in the expression of language. When the present 

10 invention is applied more broadly beyond text to forms of input in alternative mediums 
(e.g., visual images), a word will refer to the smallest unit of independent idea 
represented in the alternative medium and word string will refer to a string of units of 
meaning represented in the medium and taken as a whole unit of meaning. 

A system or apparatus for implementing the knowledgebase creation and content 

1 5 conversion or content manipulation method of the present invention can be a computer 
system 200, shown in Figure 2 . The computer system 200 includes a processor 202 
coupled via a bus 214 to a memory 208, an input device 210, and an output device 212. 
The computer system 200 can also include a storage device 204 and a network interface 
206. The processor 202 accesses data and programs stored in the memory 208. By 

20 executing the programs in memory 208, the processor can control the computer system 
200, and can carry out steps to manipulate data and to control devices including, for 
example, the input device 210, the output device 212, the storage device 204, the network 
interface 206, and the memory 208. Programs stored in memory 208 can include steps to 
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perform the methods of the present invention such as content conversion, associating 
words and word strings, and database creation and supplementing methods. 

The storage device 204 records and stores information for later retrieval by the 
memory 208 or processor 202, and can include storage devices known in the art such as, 
5 for example, non-volatile memory devices, magnetic disc drives, tape drives, and optical 
storage devices. Storage device 204 can store programs and data, including databases 
that can be transferred to the memory 208 for use by the processor 202. Complete 
databases or portions of databases can be transferred to memory 208 for access and 
manipulation by the processor 202. The network interface 206 provides an interface 

10 between the computer system 200 and a network 216 such as the Internet, and transforms 
signals from the computer system 200 into a format that can be transmitted over the 
network 216, and vice versa. The input device 210 can include, for example, a keyboard 
and a scanner for inputting data into memory 208 and into the storage device 204. Input 
data can include text of documents to be stored in a Document Database for analysis and 

15 content conversion. The output device 212 includes devices for presenting information to 
a computer system user and can include, for example, a monitor screen and a printer. 

Following is a detailed description of the present invention, including the various 
database creation methods and apparatuses (knowledge acquisition), and the conversion 
method and apparatus (i.e., knowledge reconstruction). 

20 Section II describes the different methods for creating cross-state databases. 

Section III describes the knowledge reconstruction method and apparatus which uses the 
databases to convert documents between states (e.g., translation). Section IV describes 
methods and systems called Frequency Association Database (FAD) creation and 



Common Frequency Analysis (CFA) that provide the basis to create a knowledgebase of 
related ideas within a single state. Section V describes the methods of identifying 
semantic associations and relationships between words and words strings and other words 
and word strings (Knowledge Acquisition Lists) using one embodiment of the CFA of 
5 Section IV. Section VI describes several methods and systems for using single state 

knowledge acquisition in combination with other methods of the present invention to aid 
in language translation. Section VII describes how words and word strings of 
semantically equivalent ideas (identified as part of the knowledgebase built using the 
methods described in Section V) can be reconstructed in chains to produce alternate 
10 forms of the same complex idea within a single state or language. Section VIII describes 
methods for other applications utilizing the methods and systems of the present invention. 
Section IX uses the methods and systems described in Sections IV and V for smart 
applications. 

1 5 II. CROSS-STATE KNOWLEDGEBASE ACQUISITION METHOD AND 
APPARATUS 

The present invention provides several primary methods for cross-state 
knowledge acquisition, in one embodiment represented by the translation of words and 
20 word strings between two languages. In the first aspect of the present invention, a 

knowledgebase is acquired by analyzing documents to identify similar ideas expressed in 
different states or languages. One method of the present invention for acquiring a 
knowledgebase is to examine and compare different documents that express the same 
idea (either identically or as close to identical as possible). Building associations 
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between two states using this method involves examining the same ideas in text or other 
material represented in two states or languages. 

A second method of the present invention, called multilingual leverage, also 
builds associations for an idea represented in two states by using known translations that 
5 have been built using either the methods of the present invention or existing translation 
systems. This method is referred to as multiple state association, or multilingual leverage. 

A third method of the present invention, referred to as Target Language Flooding, 
builds associations between word strings in different languages using a monolingual 
corpus in the Target Language and/or Parallel Text, along with any one or more of the 

10 following: machine translation systems known in the art, cross-language dictionaries 
known in the art, and/or custom-built cross-language dictionaries. The system generates 
alternative candidate translations for individual words in a Source Language word string 
(Target translations of Source words may be words or phrases) and searches Target 
Language documents for word strings containing different combinations of the different 

15 individual word translations in close proximity to one another. 

A, Acquisition Using Parallel Text in Two States 

One of the present invention's methods for creating a cross-idea knowledgebase 
between two languages or states includes examining and operating on previously 
20 translated or otherwise related documents in two languages. The method and apparatus 
of the present invention is utilized such that a database is created with associations across 
the two states - accurate conversions, or more specifically, associations between ideas 
expressed in one state and ideas expressed in another. For every recurring word or word 
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string in the first language, corresponding ranges in the second language documents are 
analyzed for recurring words and word strings (after the subtraction adjustment as 
illustrated in Figure 1 ) across the second language ranges. The translations and other 
relevant associations between the two states become stronger, i.e., more frequent, as more 
documents are examined and operated on by the present invention, such that by operation 
on a large enough "sample" of documents the most common associations become 
apparent and the method and apparatus can be utilized for conversion of new first 
language word strings into second language word strings. 

Another embodiment of the present invention utilizes a computing device 
such as a personal computer system of the type readily available in the prior art. 
Although the computing device is typically a common personal computer (either stand- 
alone or in a networked environment), other computing devices such as PDAs, wireless 
devices, servers, mainframes, and the like are similarly contemplated. However, the 
method and apparatus of the present invention does not need to use such a computing 
device and can readily be accomplished by other means, including manual creation of the 
cross-associations. The method by which successive documents are examined to enlarge 
the "sample" of documents and create the cross-association knowledge is varied - the 
documents can be set up for analysis and manipulation manually, by automatic feeding 
(such as automatic paper loaders as known in the prior art), by using search techniques 
such as web crawlers on the Internet to automatically seek out the related documents, 
other web search tools, or by any other method that makes text available in a digital 
format. 
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Note that the present invention can produce an associated database by examining 
Comparable Text in addition to (or even instead of) Parallel Text. Furthermore, the 
method looks at all available documents collectively when searching for a recurring word 
or word string within a language. 
5 According to this embodiment of the present invention, cross-language documents 

are examined for the purpose of building the knowledgebase, a cross-language Frequency 
Association Database of translations of word strings between and among languages. 
These word strings serve as the building blocks used to solve longer translation queries. 
For illustrative purposes, assume that the following documents contain the same content 

10 (or, in a general sense, ideas) in two different languages. Document A is in Language A, 
Document B is in Language B. 

The first step in the present invention is to calculate a word range to be used in 
determining the approximate location of possible associations for any given word or word 
string. Since a cross-language word-for-word analysis alone will not yield productive 

15 results (i.e., word 1 in Document A will often not exist as the literal translation of word 1 
in Document B), and the sentence structure of one language may have an equivalent idea 
in a different location (or order) in the sentence than another language, the database 
creation technique of the present invention associates each word or word string in the first 
language with all of the words and word strings found in a selected range in the second 

20 language document. This is also important because one language often expresses ideas in 
longer or shorter word strings than another language. The range is determined by 
examining the two documents, and is used to compare the words and word strings in the 
second document against each word or word string in the first document. That is, the 
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words and word strings in the range in the second document are examined for possible 
associations they may have with each recurring word and word string in the first 
document. By testing against a range, the database creation technique establishes a 
number of second language words and word strings that may equate and translate to the 
5 first language words and word strings. 

There are two attributes that must be determined in order to establish the range in 
the second language document in which to look for associations for any given word or 
word string in the first language document. The first attribute is the size of the range (to 
be used in the second document), measured by the number of words in the range (e.g., 50 

10 words). The second attribute is the location of the range in the second document, 

measured by the placement of the midpoint of the range. Both attributes are user-defined, 
but examples of preferred embodiments are offered below. In defining the size and 
location of the range, the goal is to insure a high probability that the second language 
word or word string translation of the segment in the first language being analyzed will 

15 be included inside the range. 

Various techniques can be used to determine the size or value of the range 
including common statistical techniques such as the derivation of a bell curve based on 
the number of words in a document. With a statistical technique such as a bell curve, the 
range at the beginning and end of the document will be smaller than the range in the 

20 middle of the document. A bell-shaped frequency for the range allows reasonable chance 
of extrapolation of the translation whether it is derived according to the absolute number 
of words in a document or according to a certain percentage of words in a document. 
Other methods to calculate the range exist, such as a "step" technique where the range 
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exists at one level for the first percentage of words, a second higher level for the middle 
percentage of words, and a third level equal to the first level for the last percentage of 
words. Again, all range attributes can be user-defined or established according to other 
possible parameters with the goal of capturing useful associations for the word or word 
5 string being analyzed in the first language. 

The user may define the range, or the system may dynamically test and adjust to 
determine a final range by starting with a narrowly defined range (e.g., ten words) and 
iteratively expanding the range until a threshold is reached or the desired information in 
the Target Language is found. 

10 The location of the range within the second language document may depend on a 

comparison between the numbers of words in the two documents. What qualifies as a 
document for range location purposes is user-defined and is exemplified by paragraphs, 
aligned sentences, news articles, book chapters, and any other discretely identifiable units 
of content, made up of multiple data segments. If the word count of the two documents is 

1 5 roughly equal, the location of the range (i.e., the range midpoint) in the second language 
will roughly coincide with the location of the word or word string being analyzed in the 
first language. If the number of the words in the two documents is not equal, then a ratio 
may be used to correctly position the location of the range. For example, if Document A 
has 50 words and Document B has 100 words; the ratio between the two documents is 

20 1 :2. The midpoint of Document A is word position 25. If word 25 in Document A is 
being analyzed, however, using word position 25 as the placement of the range midpoint 
in Document B is not effective, since this position (word position 25) is not the midpoint 
of Document B. Instead, the range midpoint in Document B for analysis of word 25 in 
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Document A may be determined by (1) the ratio of words between the two documents 
(making the range midpoint in Document B word 50), (2) by manual placement in the 
midpoint of Document B, or (3) by many other techniques. 

The user-defined size of the range may be very large to ensure a high likelihood 
5 of locating the translation of the first language word or word string in the second 
language document. For example, it might be necessary to define the range as the 25 
words to the left of the range midpoint and 25 words to the right of the range midpoint 
(for a total range of 51 words). The 51 -word range in this example would be from word 
25 to 75. The parsing and analysis of all combinations of words and word strings in the 

10 51 -word range would require many calculations. 

A more efficient method to establish the range is to establish the 51 -word range as 
described above, and then search it for certain known translations of words and word 
strings that closely precede the word or word string being analyzed in the Source (first) 
document as well as known translations of words and word strings closely following the 

15 word or word string being analyzed in the Source document. Identifying a user-defined 
number of word and word string translations in the ranges that precede and follow the 
first language word or word string being analyzed will narrow the beginning and end of 
the range to conduct the cross-language association algorithm for recurring words and 
word strings within the second language ranges. By "framing" a smaller range using 

20 known translations of words and word strings just preceding and following the word or 
word string being analyzed, the size of the final range is reduced and therefore so are the 
number of parsed words and word strings for which statistics must be calculated. 



51 



For example, assume the system is analyzing the English word string "the most 
popular" to learn the associations to Language X words and word strings using Parallel 
Text between English and Language X. Further assume that one sentence in the English 
documents is "The car is the most popular mode of transportation in America." Rather 
than analyze all word strings within 25 words of the range midpoint of the corresponding 
second language document based on the ratio of words, one embodiment involves an 
examination within that initial 51 -word range in Language X for a known translation of 
an English word string that precedes "the most popular" in the English document, such as 
the Language X word string translation of "The car." In this process, the present 
invention would also locate a word string that follows the analyzed word string in the 
English document, such as "in America" and locate its known Language X translation in 
the initial range. By identifying these known translations in Language X of word strings 
in English, the range used to parse all recurring words and word strings will encompass 
fewer potential combinations while still likely capturing the translation. Also, if the 
Source Language word string being analyzed contains a distinct (user-defined) word or 
token known to the system, the range midpoint can be efficiently set by placing it at the 
location of the translation of the token word in the Target Language text in the same 
approximate location of the document. 

By looking at the position of a word or word string in the document and noting all 
the words and word strings that fall within the range of a Parallel language document as 
described above, the cross-language Frequency Association Database creation technique 
of the present invention returns a set of words and/or word strings in the second language 
document that may translate to each word or word string in the first language document 
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being analyzed. As the database creation technique of the present invention is utilized, 
the set of words and/or word strings that qualify as possible translations will be narrowed 
as association frequencies develop. Thus, after examining a pair of documents, the 
present invention will create association frequencies for words and/or word strings in one 
language with words and/or word strings in a second language. After a number of 
document pairs are examined according to the present invention, the cross-language 
association database creation technique will return higher and higher association 
frequencies for some words and/or word strings. After a large enough sample, the 
highest association frequencies result in possible translations; of course, the ultimate 
point where the association frequency is deemed to be an accurate translation is user- 
defined and subject to other interpretive translation techniques (such as those described in 
Provisional Application No. 60/276,107, entitled "Method and Apparatus for Content 
Manipulation" filed on March 16, 2001 and incorporated herein by reference). 

As indicated above, the invention tests not only words but also word strings. As 
mentioned, word strings can include all punctuation and other marks as they occur, 
depending on user-defined parameters. If enough cross-language text exists to include 
punctuation as part of a word string, it is typically desirable to do so. After a single word 
in a first language is analyzed, the database creation technique of the present invention 
analyzes a two-word word string, then three-word word string, and so on in an 
incremental manner. This technique makes possible the translation of words or word 
strings in one language that translate into shorter or longer word strings (or a word) in 
another language, as often occurs. If a word or word string only occurs once in all 
available documents in the first language, the process immediately proceeds to analyze 
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the next word or word string, where the analysis cycle occurs again. The analysis stops 
when all words and word strings that have multiple occurrences in the first language in 
all available Parallel and Comparable Text have been analyzed. 

After the range is established, all documents should be aggregated and treated as 
5 one single document for purposes of looking for recurring words and word strings. For a 
word or word string not to repeat, it would have to occur only once in all available 
Parallel and Comparable Text. In addition, as another embodiment it is possible to 
examine the range corresponding to every word and word string regardless of whether or 
not it occurs more than once in all available Comparable and Parallel Text. 

1 0 As another embodiment, rather than pre-building the database, it can be built by 

resolving, on-the-fly, specific words and word strings that are entered as part of a query. 
When words and word strings are entered for translation, the present invention can look 
for multiple occurrences of the words and word strings in cross-language documents 
stored in memory that have not yet been analyzed, by locating cross-language text on the 

15 Internet using web crawlers, web search tools, and other devices, and, finally, by asking 
the user to supply a missing association based on the analysis of the query and the lack of 
sufficiently available cross-language material. This building of the knowledgebase on- 
the-fly represents "learning by doing" as the system builds words and word strings at the 
time they are needed for an application, and also stores them in the database for future 

20 reference. 

The present invention thus operates in such a manner so as to analyze word 
strings, and can operate in such a manner so as to account for context of word choice as 
well as grammatical idiosyncrasies such as phrasing, style, or abbreviations. 
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Occurrences of a subset word or word string will be returned as an association on 
its own and as part of a larger word string. In one embodiment of the present invention, 
after tabulating the frequency of recurring words and word strings in cross-language text, 
the system accounts for these occurrences of a subset word or word string that also 
5 appears as part of a larger word string. The present invention accounts for these patterns 
by subtracting from the frequency count the number of times a word or word string is 
returned as part of a larger word string, as illustrated in Figure 1 . For example, proper 
names are sometimes presented complete (as in "John Doe"), abbreviated by first or 
surname ("John" or "Doe"), or abbreviated by another manner ("Mr. Doe"). The present 

10 invention will most likely return more individual word returns than word string returns 
(i.e., more returns for the first or surnames rather than the full name word string "John 
Doe"), because the words that make up a word string will necessarily be counted 
individually as well as part of the phrase. Therefore, a mechanism to change the ranking 
should be utilized. For example, in any document the name "John Doe" might occur one 

15 hundred times, while "John" by itself or as part of "John Doe" might occur one hundred- 
twenty times, and "Doe" by itself or as part of "John Doe" might occur one hundred-ten 
times. The present invention's association method without adjustment will rank "John" 
higher than "Doe," and both of those words higher than the word string "John Doe" - all 
when attempting to analyze the word string "John Doe." By subtracting the number of 

20 occurrences of the larger word string from the occurrences of the subset (or individual 
returns) the proper ordering may be accomplished (although, of course, other methods 
may be utilized to obtain a similar result). Thus, subtracting one hundred (the number of 
occurrences for "John Doe"), from one hundred twenty (the number of occurrences for 
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the word "John"), the adjusted return for "John" is twenty. Applying this analysis yields 
post-adjustment frequencies of one hundred for the word string "John Doe," twenty for 
the word "John," and ten for the word "Doe," thus creating the proper associations. The 
system subtracts the number of occurrences of the larger word string association from the 
frequency of all subset associations when ranking associations of a second language to 
the first language. These concepts are reflected in Figure 1 . 

In this embodiment, to adjust for words and word strings that are subsets of larger 
words and word strings that recur in the second language ranges, the frequency for each 
word or word string is reduced by the adjusted frequency of all word strings (of which it 
is a subset). Other user-defined methods can be used so that when a word string appears 
in a range, its word and word string component parts are adjusted for final frequency 
counts. 

For example, a word string in hypothetical Language X means "very good year". 
This word string is being analyzed to build a translation association using Parallel Text 
from Language X into English, and the word string "very good year" appears 80 times in 
the English language ranges, then the word strings "very good" and "good year" and the 
individual words "very", "good" and "year" will all be counted by the system at least 80 
times in the ranges because they are part of the three-word word string. One embodiment 
of the system can make an adjustment to the frequency counts to prevent skewing the 
counts when they are part of a larger recurring string. Below is an example of how the 
frequency scores might be adjusted based on the following partial list of hypothetical 
frequency counts for words and word strings in the ranges in the English language 
documents across from the Language X word string being analyzed: 
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Word or word string Freq Count Adj. Freq. Count 

Very good year 80 80 

Goodyear 130 50 

Good 158 23 

Year 140 10 

Very good 85 5 

Very 87 2 



10 The results are a product of each frequency count being adjusted by subtracting 

the adjusted counts of all word strings it is a sub-string of. The adjusted count for the 
word "good" (23) was reached by subtracting the adjusted count for "very good year" 
(80), "good year" (50) and "very good" (5), the longer word strings it was a part of that 
recurred in the range. 

15 By calculating co-occurrences of recurring word strings of any size located in 

approximately the same relative areas across Parallel Text, the method of the present 
invention provides a cross-idea database that can be used for document content 
manipulation and conversion. Figure 1 depicts an embodiment of a cross-idea Frequency 
Association Database created by the present invention using Parallel Text. This 

20 embodiment of a cross-idea database comprises a listing of associated data segments in 
columns 1 and 2. The data segments are symbols or groupings of characters that 
represent a particular idea in a system of expression. 



57 



For example, where a system of expression in a document is a human language 
that uses words, a segment can be a word or a string of words. Thus, System A Segments 
in column 1 are data segments (in the present embodiment, words or characters with 
semantic value) that represent various ideas and combination of ideas Dal, Da2, Da3 and 
5 Da4 in a hypothetical system of Expression A. System B Segments in column 2 are data 
segments Dbl, Db2, Db3, Db4, Db5, Db6, Db7, Db9, DblO and Dbl2, that represent 
various ideas (words or characters with semantic value) and some of the combinations of 
those ideas in a hypothetical system of Expression B that are ordered by association 
frequency with data segments in system of Expression A. Column 3 shows the Direct 

10 Frequency, which is the number of times the segment or segments in Language B were 
associated with the listed segment (or segments) in Language A. Column 4 shows the 
Frequencies after Subtraction, which represents the number of times a data segment (or 
segments) in Language B has been associated with a segment (or segments) in Language 
A after subtracting the number of times that segment (or segments) has been associated as 

1 5 part of a larger segment. 

As shown in Figure 1, it is possible that a single segment, for example Dal, is 
most appropriately associated with multiple segments, Dbl together with Db3 and Db4. 
The higher the frequencies after subtraction between data segments, the higher the 
probability that a System A Segment is equivalent to a System B Segment. In addition to 

20 measuring adjusted frequencies using the metric "total number of occurrences," the 

adjusted frequencies can also be measured, for example, by calculating the percentage of 
time that particular System A Segments correspond to a particular System B Segment. 
When the database is used to translate a document, the highest ranked associated segment 
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will be retrieved from the database first in the process. Often, however, the dual-anchor 
overlap method used to combine segments for translation will dictate that a different, 
lower ranked association be used because the higher ranked association proves 
incompatible with the left or right context. 
5 For example, if the database were queried for an association for Dal, it would 

return Dbl+Db3+Db4. If the dual-anchor overlap process that accurately combines data 
segments for translation determines Dbl+Db3+Db4 cannot be used, the database would 
then return the next choice, Db9+Dbl0, to test for accurate combination through overlap 
with the contiguous associated segment or segments, for translation. 

10 Additionally, the database can be instructed to ignore common words when 

counting association frequencies for words — for example in English, words such as "it", 
"an", "a", "of, "as", "in", and the like (known in the art as "stop words") can be 
removed from consideration. This allows the association database creation technique of 
the present invention to prevent common words from potentially skewing the analysis 

15 without excessive subtraction calculations (reducing noise and unnecessary computation). 
It should be noted that if these or any other common words or subset words or word 
strings of larger word strings were not "subtracted" out of the association database, they 
would ultimately not be approved as a translation, unless appropriate, because the dual- 
anchor overlap process (described in more detail herein) would not accept it. 

20 It should be noted that stop words are typically included in the analysis of a word 

string they are a part of. For example, while the system may be instructed to ignore the 
occurrences of words like "a" and "is" when found in the ranges when establishing 
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frequencies for an individual word, the system will typically not ignore the words "a" and 
u is" as part of a recurring word string such as "she is a good student". 

Other calculations to adjust the association frequencies could be made to insure 
the accurate reflection of the number of common occurrences of words and word strings. 
5 For example, an adjustment to avoid double counting may be appropriate when the 
ranges of analyzed words overlap, as described below. Adjustments are desirable in 
these cases to build more accurate association frequencies. 

An example of an embodiment of the method and apparatus for creating and 
supplementing a cross-idea Frequency Association Database according to the present 
10 invention will now be described using the two documents presented in Table 1 below: 



Table 1 



Document A (Language A) 


Document B (Language B) 


XYZXWVYZXZ 


AABB CC AAEEFFGGCC 



While this example focuses on recurring words and word strings in only a few 
characters of Parallel Text, this is for illustrative purposes only. In the present invention 
15 recurring words and word strings will be analyzed using all available Parallel and 

Comparable Text in the aggregate. As indicated above, if multiple texts are combined, 
the range may first be established by examining each pair of documents, then recurring 
words and word strings in the ranges may be counted across all documents in the 
aggregate. 
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Using the Parallel documents listed above (Document A is in the first language 
(or Source Language); and Document B is in the second language (or Target Language)), 
the following steps occur for the database creation technique. 

Step 1 . First, the size and location of the range is determined. As 

5 indicated, the size and location may be user-defined or may be approximated by a variety 
of methods including but not limited to comparing word counts in Source and Target 
documents, finding known lexical anchors, finding sentence boundaries that correspond, 
or any other method. In this illustration, the word count of the two documents is used 
and is approximately equal (ten words in Document A, eight words in Document B), 

10 therefore we will locate the range midpoint to coincide with the location of the word or 
word string in Document A. (Note: As the ratio of word counts between the documents 
is 80%, the location of the range alternatively could have been established by applying a 
fraction of 4/5ths). In this example, variable range sizes will be used to approximate a 
bell curve: the range will be (+/-) 1 at the beginning and end of the document, and (+/ -) 2 

15 in the middle. However, as indicated, the size and location of the range (or the method 
used to determine the range) is entirely user-defined and will likely be much larger than 
the range here (chosen simply to illustrate the concepts) in order to increase the 
probability that the translation of the Source Language word or word string will be in the 
Target Language range in the Parallel Text. 

20 Step 2. Next, the first word in Document A is examined and tested against 

Document A to determine the number of occurrences of that word in the document. In 
this example, the first word in Document A is X: X occurs three times in Document A, at 
positions 1, 4, and 9. The position numbers of a word or word string are simply the 
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locations of that word or word string in the document relative to other words. Thus, the 
position numbers correspond to the number of words in a document, ignoring 
punctuation. For example, if a document has ten words in it, and the word "king" 
appears twice, the position numbers of the word "king" are merely the places (out of ten 
5 words) where the word appears. 

Because word X occurs more than once in the document, the process proceeds to 
the next step. If word X only occurred once, then that word would be skipped and the 
process continued to the next word where the creation process is continued. 

Step 3. Possible Target Language translations for Source Language word 

10 X at position 1 are returned: applying the range to Document B yields words at positions 
1 and 2 (1 +/- 1) in Document B: AA and BB (located at positions 1 and 2 in Document 
B). All possible combinations are returned as potential translations or relevant 
associations for X: AA, BB, and AA BB (as a word string combination). Thus, XI (the 
first occurrence of word X) returns AA, BB, and AA BB as associations. 

15 Step 4. The next position of word X is analyzed. This word (X2) occurs at 

position 4. Since position 4 is near the center of the document, the range (as determined 
above) will be two words on either side of position 4. Possible associations are returned 
by looking at word 4 in Document B and applying the range (+/-) 2 - hence, two words 
before word 4 and two words after word 4 are returned. Thus, words at positions 2, 3, 4, 

20 5, and 6 are returned. These positions correspond to words BB, CC, AA, EE, and FF in 
Document B. All forward contiguous permutations of these words (and their combined 
word strings) are considered. Thus, X2 returns BB, CC, AA, EE, FF, BB CC, BB CC 
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AA, BB CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, 
AA EE FF, and EE FF as possible associations. 

Step 5. The returns of the first occurrence of X (XI), which is in position 

1, are compared to the returns of the second occurrence of X (X2), which is in position 4, 
5 and matches are determined. Note that returns which include the same word or word 
string occurring in the overlap of the two ranges should be reduced to a single 
occurrence. For example, in this example the word at position 2 is BB; this is returned 
both for the first occurrence of X (when operated on by the range) and the second 
occurrence of X (when operated on by the range). Because this same word position is 

10 returned for both XI and X2, the word is counted as one occurrence. If, however, the 
same word is returned in an overlapping range, but from two different word positions, 
then the word is counted twice and the association frequency is recorded. In this case the 
returns for word X is AA, since that word (AA) occurs in both association returns for XI 
and X2. Note that the other word that occurs in both association returns is BB; however, 

15 as described above, since that word is the same position (and hence the same word) 

reached by the operation of the range on the first and second occurrences of X, the word 
can be disregarded (i.e., treated as if it had only appeared in one of the ranges). 

Step 6. The next position of word X (position 9) (X3) is analyzed. 

Applying a range of (+/-) 1 (near the end of the document) returns associations at 

20 positions 8, 9 and 10 of Document B. Since Document B has only 8 positions, the results 
are truncated and only word position 8 is returned as possible values for X: CC. (Note: 
alternatively, user-defined parameters could have called for a minimum of two characters 
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as part of the analysis that would have returned position 8 and the next closest position 
(which is GG in position 7)). 

Comparing X3's returns to Xl's returns reveals no matches and thus no 
associations. 

5 Step 7. The next position of word X would be analyzed; however, there 

are no more occurrences of word X in Document A. At this point an association 
frequency of one (1) is established for word X in Language A, to word AA in Language 
B. 

Step 8. Because no more occurrences of word X occur, the process is 

10 incremented by a word and a word string is tested. In this case the word string examined 
is "X Y", the first two words in Document A. The same techniques described in steps 2- 
7 are applied to this phrase. 

Step 9. By looking at Document A, there exists only one occurrence of the 

word string X Y. At this point the incrementing process stops and no database creation 
15 occurs. Because an end-point has been reached, the next word is examined (this process 
occurs whenever no matches occur for a word string); in this case the word in position 2 
of Document A is "Y". 

Step 10. Applying the process of steps 2-7 for the word "Y" yields the 
following: 

20 Two occurrences of word Y (positions 2 and 7) exist, so the database creation 

process continues (again, if Y only occurred once in Document A, then Y would not be 
examined); 

The size of the range at position 2 is (+/-) 1 word; 
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Application of the range to Document B (position 2, the location of the first 
occurrence of word Y) returns results at positions 1, 2, and 3 in Document B; 

The corresponding foreign language words in those returned positions are: AA, 
BB, and CC; 

5 Examining only forward-permutations yields the following possibilities for Yl : 

AA, BB, CC, AA BB, AA BB CC, and BB CC; 

The next position of Y is analyzed (position 7); 
The size of the range at position 7 is (+/-) 2 words; 

Application of that range to Document B (position 7) returns results at positions 5, 
10 6, 7, and 8: EE, FF, GG, and CC; 

All permutations yield the following possibilities for Y2: EE, FF, GG, CC, EE 
FF, EE FF GG, EE FF GG CC, FF GG, FF GG CC, and GG CC; 
Matching results from Yl returns CC as the only match; 

Combining matches for Yl and Y2 yields CC as an association frequency for Y. 

1 5 Step 1 1 . End of range incrementation: Because the only possible match for 

word Y (word CC) occurs at the end of the range for the first occurrence of Y (CC 
occurred at position 3 in Document B), the range is incremented by 1 at the first 
occurrence to return positions 1, 2, 3, and 4: AA, BB, CC, and AA; or the following 
forward permutations: AA, BB, CC, AA BB, AA BB CC, AA BB CC AA, BB CC, BB 

20 CC AA, and CC AA. Applying this result still yields CC as the only potential translation 
for Y. The range is incremented because the returned match was at the end of the range 
for the first occurrence (the base occurrence for word "Y"); whenever this pattern occurs, 
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an end of range incrementation will occur as a sub-step (or alternative step) to ensure the 
idea is not truncated. 

Step 12. Since no more occurrences of "Y" exist in Document A, the 
analysis increments one word in Document A and the word string "Y Z" is examined (the 
5 next word after word Y). Incrementing to the next string (Y Z) and repeating the process 
yields the following: 

Word string Y Z occurs twice in Document A: position 2 and 7. Possibilities for 
Y Z at the first occurrence (Y Zl) are AA, BB, CC, AA BB, AA BB CC, BB CC; 
(alternatively the range parameters can be defined to include the expansion of the size of 
10 the range as word strings being analyzed in Language A get longer.) 

Possibilities for Y Z at the second occurrence (Y Z2) are EE, FF, GG, CC, EE FF, 
EE FF GG, EE FF GG CC, FF GG, FF GG CC, and GG CC; 

Matches yield CC as a possible association for word string Y Z; 

Extending the range (the end of range incrementation) yields the following for Y 
15 Z: AA, BB, CC, AA, AA BB, AA BB CC, AA BB CC AA, BB CC, BB CC AA, and CC 
AA. 

Applying the results still yields CC as an association frequency for word string Y 

Z. 

Step 13. Since no more occurrences of "Y Z" exist in Document A, the 
20 analysis increments one word in Document A and the word string "Y Z X" is examined 
(by adding the next word after word Z (position 3) in Document A). Incrementing to the 
next word string (Y Z X) and repeating the process (YZX occurs twice in Document A) 
yields the following: 
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The range for the first occurrence of Y Z X includes positions 1, 2, 3, 4, and 5; 

Permutations are AA, BB, CC, AA, EE, AA BB, AA BB CC, AA BB CC AA, 
AA BB CC AA EE, BB CC, BB CC AA, BB CC AA EE, CC AA, CC AA EE, and AA 
EE; 

5 The range for the second occurrence of Y Z X includes positions 5, 6, 7, and 8; 

Permutations are EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FF 
GG CC, and GG CC. 

Comparing the two yields CC as an association frequency for word string Y Z X; 
again, the return of EE as a possible association is disregarded because it occurs in both 
10 instances as the same word (i.e., at the same position). 

Step 14. Incrementing to the next word string (Y Z X W) finds only one 
occurrence; therefore the word string database creation is completed and the next word is 
examined: Z (position 3 in Document A). 

Step 15. Applying the steps described above for Z, which occurs 3 times in 
1 5 Document A, yields the following: 

Returns for Zl are: AA, BB, CC, AA, EE, AA BB, AA BB CC, AA BB CC AA, 
AA BB CC AA EE, BB CC, BB CC AA, BB CC AA EE, CC AA, CC AA EE, and AA 
EE; 

Returns for Z2 are: FF, GG, CC, FF GG, FF GG CC, and GG CC; 
20 Comparing Zl and Z2 yields CC as an association frequency for Z; 

Z3 (position 10) has no returns in the range as defined. However, if we add to the 
parameters that there must be a least one return for each Language A word or word 
string, the return for Z3 will be CC. 
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Comparing the returns for Z3 with Zl yields CC as an association frequency for 
word Z. However, this association is not counted because CC in word position 8 was 
already accounted in Z2's association above. When an overlapping range would cause the 
process to double count an occurrence, the system can reduce the association frequency 
5 to more accurately reflect for the number of true occurrences. 

Step 16. Incrementing to the next word string yields the word string Z X, 
which occurs twice in Document A. Applying the steps described above for Z X yields 
the following: 

Returns for Z XI are: AA, BB, CC, AA, EE, FF, AA BB, AA BB CC, AA BB 
10 CC AA, AA BB CC AA EE, AA BB CC AA EE FF, BB CC, BB CC AA, BB CC AA 
EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF, and EE 
FF. 

Returns for Z X2 are: FF, GG, CC, FF GG, FF GG CC, and GG CC; 

Comparing the returns yields the association between word string Z X and CC. 
15 Step 17. Incrementing, the next phrase is Z X W. This occurs only once, so 

the next word (X) in Document A is examined. 

Step 18. Word X has already been examined in the first position. However, 
the second position of word X, relative to the other document, has not been examined for 
possible returns for word X. Thus word X (in the second position) is now operated on as 
20 in the first occurrence of word X, going forward in the document: 

Returns for X at position 4 yield: BB, CC, AA, EE, FF, BB CC, BB CC AA, BB 
CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF, 
and EE FF. 
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Returns for X at position 9 yield: CC. 

Comparison of the results of position 9 to results for position 4 yields CC as a 
possible match for word X and it is given an association frequency. 

Step 19. Incrementing to the next word string (since, looking forward in the 
5 document, no more occurrences of X occur for comparison to the second occurrence of 
X) yields the word string X W. However, this word string does not occur more than once 
in Document A so the process turns to examine the next word (W). Word "W" only 
occurs once in Document A, so incrementation occurs - not to the next word string, since 
word "W" only occurred once, but to the next word in Document A - "V". Word "V" 
10 only occurs once in Document A, so the next word (Y) is examined. Word "Y" does not 
occur in any other positions higher than position 7 in Document A, so the next word (Z) 
is examined. Word "Z" occurs again after position 8, at position 10. 

Step 20. Applying the process described above for the second occurrence of 
word Z yields the following: 
15 Returns for Z at position 8 yields: GG, CC, andGGCC; 

Returns for Z at position 10 yields: CC; 

Comparing results of position 10 to position 8 yields no associations for word Z. 

Again, word CC is returned as a possible association; however, since CC 
represents the same word position reached by analyzing Z at position 8 and Z at position 
20 10, the association is disregarded (i.e., treated as if it had only appeared in one of the 
ranges). 

Step 21 . Incrementing by one word yields the word string Z X; this word 
string does not occur in any more (forward) positions in Document A, so the process 
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begins anew at the next word in Document A - "X". Word X does not occur in any more 
(forward) positions of Document A, so the process begins anew. However, the end of 
Document A has been reached and the analysis stops. 

Step 22. The final association frequency is tabulated combining all the 
5 results from above and subtracting out duplications and, if they had occurred, subset 
strings of larger strings (as reflected in Figure 1 ), as previously explained. 

Obviously, there is insufficient data to return conclusive results for words and 
word strings in Document A. As more document pairs are examined containing words 
and word strings with those associations examined above, the association frequencies will 

10 increase such that word and word string translations between Languages A and B will 
build strong associations. The above range calculations illustrate the concept although 
typically the user-defined range will be substantially larger than three words to ensure the 
translation is usually included. 

To further strengthen the associations that are built using Parallel Text and the 

1 5 process just described, the process can be run in the reverse direction. The system can 
use the Target Language word string translation candidates that appeared most frequently 
in the Target Language ranges using the process just described, and build associations for 
those Target Language words and word strings in the Source Language using the 
available Parallel Text. If the Source Language word or word string that originally 

20 generated the Target Language translation candidate ranks high enough (based on user- 
defined frequency or percentage) on the Target Language candidates list, the Target 
Language translation candidate for that Source Language term can be approved as a valid 
translation for the Source Language term (word or word string). This is referred to as the 
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"bi-directional locking mechanism" of the present invention. Ultimately, Parallel Text in 
each language pair can be used to build out association databases going in both 
directions. 

In an alternative embodiment for cross-language association using Parallel Text, a 
range in the Target Language is chosen for each recurring word or word string being 
analyzed in the Source Language, corresponding ranges in the Target Language are 
determined in accordance with the above-described method. Then all recurring words 
and word strings within those ranges are added together to obtain their frequency counts. 
The frequency of words and word strings in the ranges are reduced by the frequency 
count of larger word strings to avoid counting smaller parts of larger word strings as 
described above and illustrated in Figure 1 . This will give less weight to the most 
frequent word strings than the embodiment described above that associates words and 
word strings for each range individually to all other ranges. The embodiment described 
here, therefore, typically will require more documents to build reliable translations. 

For example, assume that the Language X word string "11 mm pp" is being 
analyzed to find an association in Parallel documents in Language Y. If the word string 
"11 mm pp" is found four times in the Language X documents, four ranges of Language Y 
words and word strings are established in Language Y documents, one corresponding to 
each Language X word string "11 mm pp" found in the Parallel documents. If one correct 
translation in Language Y is "KK BB ZZ" and it is found in all four ranges, the above 
embodiment would produce a frequency count of four. The previous embodiment 
(analyzing each range independently against all other ranges) would produce a frequency 
count for "KK BB ZZ" of six. Once ranges are established, there are a variety of user- 



71 



defined methods for tabulating frequencies of recurring words and word strings which, 
depending on the tabulation method, will provide higher or lower relative weights to 
individual results; the methods described above illustrate two preferred embodiments of 
tabulation methods. 

5 The languages can be any type of conversion and are not necessarily limited to 

spoken/written languages. For example, the conversion can encompass computer 
languages, specific data codes such as ASCII, and the like. The database is dynamic, i.e., 
the database grows as content is input into the translation system, with successive 
iterations of the translation system using content entered at a previous time. 

10 As demonstrated, this embodiment is representative of one technique of the 

present invention used to create associations. The techniques of the present invention 
need not be limited to language translation. In a broad sense, the techniques will apply to 
any two expressions of the same idea that may be associated, for at its essence foreign 
language translation merely exists as a paired association of the same idea represented by 

1 5 different words or word strings. Thus, the present invention may be applied to 

associating data, sound, music, video, computer programming languages, or any wide- 
ranging representations that exists for an idea, including ideas that are embodied by any 
sensory (sound, sight, smell, etc.) experiences. All that is required is that the present 
invention analyzes two embodiments of the same idea associated by co-occurrence of 

20 time (or in the case of documents, location of co-occurrence). 

For words or word strings that cannot be translated using the cross-language 
documents, another embodiment of the present invention (described later) can generate 
words and word strings that are semantically equivalent to words or word strings in the 
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Target or Source Language to provide additional ways to identify alternative word or 
word string translations. This method also allows the interchanging of certain class 
members of broad categories that share common contexts and sometimes can have 
potentially infinite members, such as names and numbers. 

In addition, if available cross-language documents do not furnish statistically 
significant results for translation, user-defined parameters can combine the other methods 
of cross-language word string association of the present invention instead of, or in 
combination with, the method using Parallel Text. As a last resort, users can examine the 
candidates for translations and other associations that do not meet user-defined thresholds 
for approval, and approve and rank appropriate choices manually. 

B. Acquisition Using Multiple-State Texts 

Another embodiment of the present invention provides a method for building 
associations between equivalent or similar ideas in two languages or states by using 
15 associations between each of those two states and other third states. As documents in 

more language pairs are examined, the method and apparatus of the present invention will 
begin filling in "deduced associations" between language pairs based on those languages 
having a common association with other third languages, but not directly with one 
another. This type of indirect translation through multiple states is known as 
20 "multilingual leverage." 

Deduced associations through the multilingual leverage technique can be 
produced between text in a pair of languages when the Source word string being 
translated has a known translation into one or several third languages, and the different 
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third language translations have known translations into the Target Language. For 
example, if there is insufficient cross-language text to translate directly a Language A 
phrase "aa dd pz" into a Language B phrase, deducing an association can include 
comparing this Language A phrase with the phrase's translations in Languages C, D, E, 
5 and F, as shown in Table 2. Then, the translations of "aa dd pz" in Languages C, D, E, 
and F can be translated into Language B, as shown in Table 3. Deducing the association 
between Language A phrase "aa dd pz" and a phrase in Language B further includes 
comparing the Language B phrases that have been translated from the Language C, D, E, 
and F translations of "aa dd pz." Some of the Language B phrases that have been 

10 translated from the Language C, D, E, and F translations of "aa dd pz" may be identical 
and, in this preferred embodiment of the present invention, these will represent the 
correct Language B translation of the Language A phrase "aa dd pz." As shown in Table 
5, Language C, D, and F translations to Language B produce identical Language B 
phrases, to provide the correct Language B translation, "UyTByM." Thus, a deduced 

15 association can be created between the Language A phrase and its correct Language B 
translation. Language E translation into Language B produces the non-identical 
Language B phrase ZnVPiO. This may indicate that Language A phrase "aa dd pz" or 
Language E phrase "153" has more than one meaning or that Language B phrases 
UyTByM and ZnVPiO are semantically equivalent (or similar) and will be approved at a 

20 time when confirmed by an indirect translation through another language into the phrase 
"ZnVPiO" or that translation result is produced using some other method. 
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Table 2 





Language A 


Language C 


Language D 


Language E 


Language F 




aa dd pz 


Aid 


Zyp 


153 


1AAAA))$ 



Table 3 



Language 


Translation from Language A 
for "aa dd pz" 


Translation to Language B 


Language C 


Al d 


UyTByM 


Language D 


Zyp 


UyTByM 


Language E 


153 


ZnVPiO 


Language F 


1AAAA))$ 


UyTByM 
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In another embodiment, use of the multilingual leverage method and apparatus of 
the present invention described above can improve the accuracy of existing translation 
systems known in the art. Existing translation systems (e.g., Rule-Based MT, SMT) will 
take a query and produce a result from Language A to Language B; this result may be 
10 compared to the results of the translation (using systems and apparatus of the prior art) of 
the query from Language A to other languages (e.g., languages C, D, E, and F) and, 
subsequently from those languages to Language B (using systems and apparatus of the 
prior art). 
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In order to confirm a translation, one embodiment of multilingual leverage using 
existing machine translation systems can require each Target Language word string (that 
is translated indirectly through a number of third languages) to appear in a number of 
user-defined common results in the Target as described above. Requiring that a user- 
5 defined number of indirect Target Language translations of a word string (using 
intermediate third language state of the art translation systems) match exactly to one 
another in the Target Language before being confirmed will increase the accuracy of each 
translated word string. While the accuracy of translation systems known in the art is not 
high, a number of common results in the Target Language from different intermediate 

10 third languages can exist if enough third language translation systems are used. 

Moreover, by connecting these indirect Target Language translations with a relatively 
high user-defined overlap required in the dual-anchor overlap aspect of the present 
invention (described in detail later), the accuracy of results of this embodiment can be 
further tested and enhanced. 

15 Another embodiment of the multilingual leverage technique can use translations 

from Source Language to intermediate third languages and from those third languages 
into Target Language using a combination of the present invention's cross-language 
learning and word string translations in the database along with translation systems 
known in the art. The same basic principle is used to confirm a Target Language 

20 translation; a user-defined number of common indirect Target Language translation 
results from different third languages. 

The number of common Target Language results required and number of 
intermediary languages used for multilingual leverage is user-defined. The more indirect 
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translations through other languages used to verify translations of a word string or any 
other data segment, the more certain that the present invention will produce an accurate 
translation. As a final check for confirmation, based on user-defined criteria, Target 
Language translation results can be translated back to the Source Language using one or 
5 more third languages using the same technique as described above. If the translation 
back into the Source is either the original Source Language word string to be translated, 
or determined to be a semantic equivalent of the original Source Language word string 
(using Common Frequency Analysis of the present invention, which is described later), 
the translation into the Target Language is approved. 

10 

C. Acquisition Using Target Document Flooding 

Another aspect of the present invention builds associations between word strings 
of different languages using a monolingual corpus in the Target Language and/or Parallel 
Text, along with any one or more of the following: machine translation systems known in 

15 the art, cross-language dictionaries known in the art, and/or custom-built cross-language 
dictionaries. These methods, which use the "Flooding" technique of the present 
invention, generate potential Target translations of the individual words of each word 
string parsed from a Source query using custom-built systems or systems known in the 
art, as mentioned above, (even though some of the potential word translations are likely 

20 to be wrong), and then searches Target Language documents for different combinations 
of the potential word translations (Target translations of Source words may be words or 
phrases) to produce a list of translation candidates for the Target word string. 
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In another embodiment using the Flooding technique, Source Language co- 
locations and idioms made up of two or more words are included in the dictionary. In 
this embodiment, each Source Language query word string is first tested to identify any 
known idiomatic or co-location word strings that make up part or all of the query word 
5 string. If an idiom or co-location is identified in the query, the translation of the idiom or 
co-location is retrieved from the dictionary and used as part of the Flooding process to 
search the Target corpus instead of using the translations of the individual words that 
make up the idiom or co-location. Obviously, any other Source Language word string 
can be added to the dictionary as well and translated into the Target Language for use in 
10 the Flooding process instead of translating those words individually. 

1. Parallel Text Flooding 

In one embodiment, Parallel Text is used along with a translation system known in 
15 the art (or a cross-language dictionary). To build Target Language associations for word 
strings in the Source Language, locate each word string's occurrence in the Source 
Language documents and establish corresponding ranges in the Parallel Text Target 
Language documents. The Target Language ranges are established in the same manner as 
they are when building cross-language associations using Parallel Text as described 
20 previously. A translation (or translations, if multiple systems are used) for the Source 
Language query word string is generated using a machine translation system known in 
the art, dictionary known in the art, or custom-built dictionary. The ranges in the Target 
Language documents are then searched using the translations (even though some of the 
translations are likely to be wrong), to identify words and word strings that are translation 
25 candidates. If any one of the identified word or word string translation candidates is 
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found in a user-defined number or percentage of the ranges Flooded, that association may 
be approved as a translation. If a cross-language dictionary is used instead of a machine 
translation engine known in the art, each word of a Source Language word string is 
translated using all possible known translations of each word (Target translations of 
Source words can be words or phrases, as mentioned above), and different combinations 
of the word translations are identified within the ranges in the Target Language of the 
Parallel Text using the method described in the next section for Target Language 
Flooding. In addition, the Source Language query word string can be searched for idioms 
or co-locations (using the Source Language entries of a cross-language dictionary of 
idioms and co-locations); if the Source Language query word string contains an idiom 
and/or co-location, the translation can be used to Flood the Target corpus along with the 
word-for-word (and/or word-for-phrase) translation possibilities, as described herein. 

2. Target Language Flooding 

Using another method and embodiment of the Flooding technique, word strings can 
be translated from the Source Language to the Target Language by translating each word 
of the word string using a cross-language dictionary (or translation system known in the 
art) and searching for groups of those translated words in all available Target Language 
word strings using a Target Language corpus. This method does not rely on Parallel Text 
and requires only a large Target Language corpus (e.g., a document database, the world 
wide web). The need for only a corpus comprised of Target Language documents 
without translation counterpart documents in another language expands the opportunities 
for the present invention to identify word string associations across languages. As with 
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all methods of the present invention that identify word string translations, word strings to 
be translated may be parsed from a Source document into word strings of user-defined 
size (i.e., number of words in the string) with a user-defined minimum number of 
overlapping words (as described later) to generate word strings for translation analysis 
5 on-the-fly, or word strings can be examined for addition to a translation knowledgebase. 

Using the Target Language Flooding method, first, each word of a word string 
(the Source Language query word string) is translated to the Target Language on a word- 
for-word (and/or word-for-phrase) basis using a cross-language dictionary (or other 
translation system known in the art). The dictionary will often offer multiple options or 

10 candidates, and all Target Language translation candidates provided by the dictionary for 
each word of a word string being analyzed are identified. The dictionary may also 
contain translations for a Source Language word that translates into a Target Language 
word string (i.e., phrase). In this case, the word string will be translated as a single unit 
for searching the Target Language corpus. The dictionary may also be populated with 

15 translations of common Source Language idioms and co-locations. The Source Language 
query word string can be searched for idioms or co-locations, and if the Source Language 
query word string contains an idiom and/or co-location, their translations can also be used 
to Flood the Target corpus, as described herein. Flooding the Target corpus using idiom 
and/or co-location translation candidates can be done either before or along with the 

20 Flooding process described herein that uses translation candidates generated on a word- 
for-word (and/or word-for-phrase) basis. In addition, if the invention is being used for a 
Source Language where certain combinations of words can be combined in some way to 
form one word, the system can be adjusted to parse those kinds of words into the two or 
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more individual components to be translated into two or more individual Target 
Language words. 

For example in Hebrew, instead of having an independent word for "and " a 
Hebrew letter (the Hebrew letter "vuv") that means "and" is attached to the front of the 
5 word it refers to. In this case the invention could parse words starting with "vuv" from 
the rest of the word and generate a translation for "and," and a translation for the rest of 
the Hebrew word that "vuv" referred to. Additionally, if words are translated into the 
Target Language individually using a translation system known in the art, these systems 
typically produce two or more Target Language words for those word combination 

10 examples in the Source Language. Rules for different languages involving word 
combinations, word conjugations and other root word variations for tense, singular, 
plural, and the like, can be codified to expand the dictionary words used and accurately 
represent the semantic units to be searched in the Target Language corpus. 

Next, after individual Target Language word translations are generated for each 

15 word (or idiom or co-location) in the Source Language query word string, the system 
searches a Target Language corpus for word strings of a user-defined maximum length 
containing a user-defined minimum number (or percentage) of translation candidates 
generated for each word of a Source Language query word string (in addition to any other 
user-defined search criteria). No more than one of the candidate translations generated 

20 for each Source Language word is counted in the Target Language word string toward 
satisfying the user-defined search requirements. A Target Language word string of user- 
defined maximum length will qualify if it contains any combination, found in any order, 
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of the user-defined minimum number of candidates generated by the different Source 
Language words. 

Qualifying word string returns form what is referred to as the "Query String 
Flooding List " Additionally, user-defined requirements can set the parameters for the 
Query String Flooding List based on the proximity of Source Language words and their 
Target Language counterparts. For example, user-defined parameters can require a 
Target Language translation of a Source Language word to be found within a user- 
defined number of words of a Target Language translation of an adjoining Source 
Language word. Candidates can be retrieved based on other user-defined search 
parameters, including the relationship between the distance between individual words in a 
Source Language word string and the distance between their respective translations in the 
Target Language word string translation candidates. Moreover, any user-defined 
parameters can incorporate these and/or other factors in the ranking of Target Language 
translation candidates. These settings for qualification and ranking will vary depending 
upon language pair based on the relationship between the two languages' structures. 

To illustrate the Flooding technique using only a Target Language corpus, 
consider a four-word word string in Language X to be translated: 

"aabb cc dd" 

The system would translate each word in the string to the Target Language, 
Language Y. Assume the cross-language dictionary had the following Language Y 
definitions for each word in the above Language X word string: 

Language X Word Language Y Translations 

aa AA1, AA2, AA3, AA4, AA5, AA6 
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bb 
cc 
dd 



BB1,BB2, BB3 
CC1,CC2, CC3,CC4 
DDI, DD2, DD3, DD4, DD5 



The system would then search a corpus of Target Language documents to locate a 
user-defined minimum number of the translations of the words (but only one candidate 
for any specific Source word counts toward the minimum) in a user-defined range. In 
this example, assume the parameters are set such that a minimum of three of the 
translated words (counting only one translation for any Source Language word) must be 
found within a string of six or fewer total words, regardless of the word position or order 
in which they are found. A partial list of some possible qualifying word strings found in a 
hypothetical Target Language corpus for this example might be: 

Query String Flooding List (partial) 

1. DD1 AA2CC2 BB3 

2. AA1 BB1CC3 EE1 

3. BB2 FF1 KK1 AA2 LL3 DD5 

4. DD4 PP1 UU1 AA6 CC4 BB2 

5. CC1KK1RR2BB3 DD4 

6. BB1 CC3 EE1 DD4 

The returns for the Query String Flooding List can be further expanded by 
identifying any two results on the list that combine through overlap of a word string to 
form a larger word string result. These word string combinations can be added to the 
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Query String Flooding List as possible word string translations. For example, in the 
above list of returns, the second return "AA1 BB1 CC3 EE1" and the sixth return "BB1 
CC3 EE1 DD4" can combine through overlapping word strings to form "AA1 BB1 CC3 
EE1 DD4" which can be added to the Query String Flooding List. 

Returns on the Query String Flooding List are ranked based on user-defined 
criteria which typically include at least (1) largest number (or percentage) of Source word 
translations in the Target Language string (counting only one Target Language 
translation for each Source Language word) and (2) the smallest Target Language word 
strings (fewest words) that meet the first user-defined criteria for minimum number of 
Source Language word translations. For example, based on these two criteria (and 
weighting the first more than the second), the above returns could be ranked as follows: 

1. DDI AA2CC2BB3 

2. AA1 BB1 CC3 EE1 DD4 

3. DD4 PP1 UU1 AA6 CC4 BB2 

4. AA1 BB1 CC3 EE1 

5. BB1 CC3 EE1 DD4 

6. CC1 KK1 RR2 BB3 DD4 

7. BB2 FF1 KK1 AA2 LL3 DD5 

The above rankings reflect a user-defined greater weighting of the first criteria 
(number of translated words in a word string) more than the second criteria (smallest 
word strings meeting first criteria). The first ranked result has all four translated words in 
a four-word word string. The second ranked result is the word string that was created 
(and added to the Query String Flooding List) by overlapping other returns, and contains 
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all four translated words in a five-word word string. The third ranked result has all four 
translations in a six-word word string. Results ranked four and five are tied because both 
word strings contain three of the four translated words in a four-word word string. The 
sixth ranked result has three translated words in a five-word word string and the last 
5 ranked result has three translated words in a six-word word string. 

Additionally, user-defined criteria based on the distance between Source 
Language words and their Target Language counterparts can be established. For 
example, if user-defined criteria required that translations for contiguous Source 
Language words be within three words of each other or less to qualify for the Query 

10 String Flooding List, the third (DD4 PP1 UU1 AA6 CC4 BB2) and sixth (CC1 KK1 RR2 
BB3 DD4) ranked members would be eliminated. Note that a smaller word string that is 
a subset of the third ranked result would qualify for the Query String Flooding List (i.e., 
words four through six of the word string - DD4 PP1 UU1 AA6 CC4 BB2). Also note 
that when a Source Language word (or co-location or idiom) translates into a Target 

1 5 Language word string, the Target Language word string is always treated as a single unit 
(i.e., words in the word string must remain contiguous and in the same order) for the 
purpose of Flooding the Target Language corpus (except for occasional cases based on 
the particular characteristics of a language where all the words in the Target translation 
will not be contiguous). 

20 Another embodiment of the invention for ranking Query String Flooding List 

returns can use a point system and add points for each word in the Target Language word 
string that is a translation of a Source Language word from the Source Language query 
word string, and deduct points for each word in a qualifying Target Language word string 
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that is not a translation of one of the words in the Source Language query word string. 
Moreover, a word can count for more or less points based on its general frequency in 
language. For instance, non-stop words can be weighted more than stop words. 

For example, user-defined settings may score each Target Language word string 
5 on the Query String Flooding List (1) by adding or subtracting 5 points for each stop 
word that appears in the Target Language word string based on whether or not it is a 
translation of a Source Language word from the Source Language query word string, and 
(2) adding or subtracting 20 points for a non-stop word (i.e., a word that isn't a frequently 
recurring word like "it", "and", or "the") that appears in the Target Language word string 

1 0 return based on whether or not it is a translation of a Source Language word from the 
Source Language query word string. 

To illustrate this scoring using the previous example, assume "aa" and "cc" are 
stop words, and "bb" and "dd" are not stop words. In this example under the above user- 
defined scoring parameters, the word string "AA1 BB1 CC3 EE1" would have a score of 

1 5 25 if EE1 is a stop word (5+20+5-5=25), and it would have a score of 10 if EE1 were not 
a stop word (5+20+5-20=10). Any other scoring scheme based on the number of words 
translated from the Source Language query word string and found on a word string on the 
Query String Flooding List can be used. 

Returns produced at this point in the process will include correct, partially correct, 

20 and incorrect Target Language translation word strings. As described later, the present 
invention translates a Source document by parsing the document into overlapping word 
strings and combining Target Language word string translations that overlap. The 
requirement of large overlapping word strings (i.e., many words) between translation 
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word strings will eliminate returns on a Query String Flooding List that are not correct 
translations of a word string because they do not have a user-defined sized overlap with 
other word string translations (as described later). 

Returns on the Query String Flooding List, or any returns (using any method) that 
5 have not reached user-defined criteria to be confirmed as accurate translations, can be 
used in large overlapping chains, as described later, but only if the word strings that are 
the first and last word strings of a translation unit have been confirmed previously as 
accurate word string translations. Alternatively, the word string to the extreme left of a 
translation must be accurate on its left side and the word string to the extreme right of the 

10 larger translation must be accurate on its right side. Large overlapping (described later) 
unconfirmed translations sandwiched between two translations that are known to be 
accurate word string translations, or are at least confirmed on their far edges, can provide 
the basis of an accurate translation. 

The Query String Flooding List can be refined by eliminating returns that are not 

1 5 correct translations without testing for overlapping word strings by performing the same 
Query String Flooding analysis as described above on larger word strings that include the 
original query word string plus additional words on each side. This embodiment will 
require a Source Language corpus that contains the Source Language query word string 
along with surrounding context words and or word strings, but this Source Language 

20 corpus need not be Parallel Text documents to the Target Language corpus. Using this 
method to continue the example above, the system would search Source Language text 
for a user-defined number of Source word strings containing the word string "aa bb cc 
dd" plus a user-defined number of words on either side. User-defined criteria can require 
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that these longer Source word strings be parsed into a user-defined number of additional 
segments of user-defined size containing "aa bb cc dd" and then used to Flood Target 
Language documents as described above. 

If for example the user requests five word strings each with three words on each 
5 side of the original string, the five Source word strings returned using a Source Language 
corpus might be: 

1. "zz xx yy aa bb cc dd 11 mm nn" 

2. "kk rr 11 aa bb cc dd aa kk oo" 

3. "kg lhwk aa bb cc dd ql io rr" 
10 4. "ck nk ak aa bb cc dd bk sk jk" 

5. "dm ea jc aa bb cc dd tg ms jf 

This process would then parse the above strings into a user-defined number of 
word strings of user-defined size (in this example, a minimum of 5 words) to create 
Source Language word strings to be used to Flood the Target Language corpus based on 
15 user-defined criteria described below. If all possible parsings of the strings containing 
the original query are required by the user for analysis, the following parsed word 
combinations will be generated for the first word string identified above: 
"zz xx yy aa bb cc dd 11 mm nn" 
"zz xx yy aa bb cc dd 11 mm" 
20 "zz xx yy aa bb cc dd 11" 

"zz xx yy aa bb cc dd" 

"xx yy aa bb cc dd 11 mm nn" 
"xx yy aa bb cc dd 11 mm" 
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"xxyyaabb cc dd 11" 
"xx yy aa bb cc dd" 

"yy aa bb cc dd 11 mm nn" 
"yy aa bb cc dd 11 mm" 
"yy aabb cc dd 11" 
"yy aa bb cc dd" 

"aa bb cc dd 11 mm nn" 
"aa bb cc dd 11 mm" 
"aa bb cc dd 11" 

Potential Target Language translations for each of these word strings would be 
produced using the Flooding process described above. Each word string is analyzed by 
translating each word individually using a dictionary or an existing machine translation 
system and searching Target Language documents for Target Language word strings 
containing translations of the individual words, based on user-defined requirements for 
minimum number of word translations within a maximum number of words (and/or other 
requirements). The lists of Target returns generated are referred to as the "Query + 
Context Flooding Lists." The system would then generate Query + Context Flooding 
Lists for each remaining parsed word string derived from each of the original Source 
word strings (i.e., the Source word string query plus left and right context words — in this 
example, the remaining four ten-word word strings (2 through 5) identified above). 
Alternatively, a greater number of word strings with a context word or user-defined sized 
context word string to the right and left of the query word string can be generated by 
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searching the Source Language corpus, and each string can be used in its entirety to 
create a Query + Context Flooding List without further parsing into smaller word strings. 

Next, the system uses each of the results from the Query String Flooding List and 
searches for each as a sub-string of a larger word string on all of the Query + Context 
5 Flooding Lists generated from all Source Language word strings made up of the original 
query plus left and/or right context word strings. The system counts the total number of 
times a return from the Query String Flooding List appears as a sub-string of a longer 
word string result (or independently) on the Query + Context Flooding Lists. 

These counts are then adjusted by subtracting the number of times a smaller word 

10 string (on the Query String Flooding List) appears as part of a larger word string (on the 
Query String Flooding List). For example, assume both word strings "DDI AA2 CC2" 
and "DDI AA2 CC2 BB3" are on the Query String Flooding List. If word string "DDI 
AA2 CC2" appears 120 times as a sub-string of the word strings on the Query + Context 
Flooding Lists, and "DDI AA2 CC2 BB3" has a count of 100, the frequency count for 

15 "DDI AA2 CC2" would be adjusted by subtracting the number of times it appeared as 
part of the larger string "DDI AA2 CC2 BB3", i.e., 120 minus 100 equals 20. This 
subtraction adjustment is conceptually similar to the subtraction adjustment made when 
using the method to build cross-language associations using Parallel Text that subtracts 
occurrences of smaller word strings that are part of a larger recurring word string, as 

20 illustrated in Figure 1 . 

The word strings on the Query String Flooding List are then re-ranked based on 
the total number of times each result was found as sub-string of a larger word string (or 
independently) on the Query + Context Flooding Lists (after the subtraction adjustment 
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described in the previous paragraph). Alternatively, user-defined parameters may require 
that the rankings be based partly on certain other factors including the number of words 
in the context word strings the result is found in as a sub-string and the balance between 
the number of times the sub-string is part of a left context word or word string and the 
5 number of times the sub-string is part of a right context word or word string. 

At this stage in the process, if user-defined parameters require that only the left 
side or "edge" word string of a larger translation query is confirmed as accurate because 
it is the first word string in a chain of large overlapping word strings, then only left 
context words or word strings will be used for Query + Context Flooding Lists. If it is 

10 the right side word string in a long chain of overlapping word strings, then only right side 
context words and word strings will be produced along with the query for Query + 
Context Flooding Lists. 

As an alternative embodiment, Query + Context Flooding Lists can be generated 
without generating a Query String Flooding List. Instead, each word string on a Query + 

15 Context Flooding List is treated as a Target Language range as used in cross-state 

learning using Parallel Text, and each is analyzed for recurring word strings in the same 
way. The counts of recurring word strings are tabulated and adjustments to the counts of 
shorter strings are made by subtracting the number of times they appeared as part of 
longer strings. If this method is employed, Query + Context Flooding Lists should be 

20 generated using different context words or word strings (rather than parsing the same 

strings in different sizes) for best results. Alternatively, parsing of context strings can be 
used, but translation of context words in context word strings would be ignored by the 
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system for counting recurring word strings among the members of the Query + Context 
Flooding Lists. 

There are other methods for improving Query String Flooding Lists. One of these 
methods involves generating close semantic equivalents for the query using the Common 
Frequency Analysis aspect of the present invention described later. Once additional 
Source Language word strings that represent ideas semantically similar to the query are 
generated, a cross-language dictionary can be used to perform the previously described 
Flooding technique on each option. This technique expands the number of Source 
Language translation options and is particularly useful when the original query word 
string involves an idiomatic expression (that is not in the cross-language dictionary) 
where the individual words may loose their semantic character completely. 

The same process can be performed on each of the highest ranking results on the 
Query String Flooding List. A user-defined number of Target Language word strings on 
the Query String Flooding List (e.g., the top five) can be used to build a user-defined 
number of semantically similar Target Language word strings (e.g., five for each) using 
the aspect of the present invention that identifies semantically similar word strings, 
described later. These groups of synonymous word strings can be used to find common 
strings across multiple lists for confirmation of the word string translations that satisfy 
user-defined minimums for number or percentage of common word strings on any 
return's semantic equivalent list (described later). Additionally, these groups of 
synonymous word strings can be translated word-for-word back to the Source Language 
to see which group has the highest number of translations in common with the group of 
word strings synonymous to the Source Language query (as well as the query itself). The 
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group of synonymous Target Language sentences that have the highest number of words 
translated back to Source Language that match the Source Language word strings or its 
synonyms, is the correct group of Target Language translations. 

An additional method for refining the Query String Flooding List involves the use 
of the multilingual leverage technique in conjunction with the Flooding technique. In this 
embodiment, the Source Language query word string can be translated word-for-word 
(and/or word-for-phrase), using all possible translations for each word, into one or more 
third languages and each third language corpus of text is Flooded by searching for 
sentences and other word strings containing a user-defined minimum number of 
translated words within a maximum user-defined number of total words, as described 
above. Qualifying third language word strings are then translated word-for-word (and/or 
word-for-phrase) into the Target Language to be used to search for Target Language 
word strings that meet user-defined Flooding criteria described above. Alternatively, the 
translated words in the third language can be directly translated into the Target Language 
to be used to search for qualifying Target Language word strings, without searching the 
third language corpus for third language word strings as described in the previous step. 
Word strings found in the Target Language that qualify for the Query String Flooding 
List using more than one intermediate third language lends further confirmation of 
translations. Synonymous word strings in Source Language, Target Language, and 
intermediate third languages can be generated and used with a cross-language dictionary 
to further confirm translations as described above. 

The multilingual leverage aspect of the present invention will also be useful to 
build and expand word level dictionaries for use in the present invention Target 
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Language Flooding embodiments, as well as for any other purpose. If several 
dictionaries known in the art or custom-built are incomplete either because a Source 
Language word does not have an entry or has an entry but does not have a complete list 
of potential Target Language translations, the present invention can supplement the 
5 dictionary by using known translations of Source Language words into one or more third 
languages. The system can then take all the third language words and identify known 
Target Language translations. The most frequent Target Language translations produced 
using intermediate third languages are approved as translations. User-defined criteria 
determine how many common results qualify as a translation. Alternatively, a human 

10 editor can evaluate the list produced and eliminate incorrect translations if desired. 

Moreover, dictionaries can also be built using the methods and system for cross-language 
frequency association by examining single words in the Source Language. Target 
Language translation entries can also be expanding by the use of the method of the 
present invention that identifies semantically similar words and word strings within a 

15 single state or language using Common Frequency Analysis (described later). 

D. Acquisition Using Multi-Method Differential 

If any method used to identify cross-state associations produces a word string 
translation candidate that does not yet meet user-defined criteria for near statistical 
20 certainty as a correct translation, the partial results of two or more methods can be used 
together to confirm the association as a correct translation, or failing that, to move on to 
the next candidate translation. This will be desirable in cases where the text available for 
analysis does not have enough relevant word strings to approach statistical certainty. It 




will also be useful to employ partial results from different methods to confirm word 
string translations as a way to build associations using fewer calculations (which will 
save processing power and time). Additionally, as indicated above, the method of the 
present invention that identifies semantically equivalent word strings can be used to assist 
5 any of the methods for translation of word strings of the present invention or of any other 
system to identify or confirm word string translations. 

It should also be noted that the present invention is able to keep track of results of 
user-defined parameters for determining approved results for translations (as well as 
semantic equivalents described later and any other output of methods of the present 
10 invention). This evaluation of the results will allow the system to use the results to 
automatically determine efficient defined parameters. These requirements will often 
include a combination of methods to provide combined near statistical certainty that a 
return is accurate. 



1 5 III. CROSS-STATE KNOWLEDGE RECONSTRUCTION METHOD AND 
APPARATUS 

Another aspect of the present invention is directed to providing a method and 
apparatus for creating a second document comprised of data in a second state, form, or 
20 language, from a first document comprised of data in a first state, form, or language, with 
the end result that the first and second documents represent substantially the same ideas 
or information, and wherein the method and apparatus includes using a cross-idea 
association database. Database entries may be "pre-built" or may be built on an "as 
needed" basis (on-the-fly) using any method of the present invention. 
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One embodiment of the translation method utilizes a dual-anchor overlap 
technique to obtain an accurate translation of ideas from one state to another. An 
alternative embodiment would allow the approval of contiguous segments in the Target 
Language without a Target Language overlap from direct translations of overlapping 
5 Source Language word strings if indirect translations through a third language and then 
into the Target Language overlapped in the third language and their translations 
overlapped in the Target Language as well. The present invention, using the dual-anchor 
overlap technique, enables the building block word strings in a second language, form or 
state to be connected together organically and become accurate translations in their 

10 correct context in the exact manner those words and phrases would have been written or 
spoken by a native speaker of the second language. This technique resolves the issue of 
boundary friction encountered by existing EBMT systems. 

In an embodiment of the present invention, the methods for word string 
association database creation and the overlap technique are combined to provide accurate 

15 language translation of documents of any length. By parsing any Source Language input 
into a series of word strings each with a user-defined number of overlapping words with 
both of the parsed word strings before and after it, and testing translations of those word 
strings in a Target Language for overlapping words or word strings, the present method 
and system can translate documents by piecing together building block ideas in a chain. 

20 When more overlapping words are required by user-defined settings, it results in a more 
accurate combination of word string translations in a Target Language. 

Moreover, the results of word string translations assembled either manually or 
through any automatic method including any of the methods of the present invention used 
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to build word and word string associations across language (e.g., using Parallel Text, 
multilingual leverage, Target Language Flooding, etc.) can be tested for accuracy by 
requiring greater word string overlap (i.e., more overlapping words) with the neighboring 
word strings on both sides of the word string translation when it is taken as part of a 
5 larger translation query (as long as they are anchored by known word string translations 
on both sides). The dual-anchor overlap technique will not permit otherwise 
semantically correct translations that do not fit the specific context of a larger translation 
query; furthermore, the dual-anchor overlap will eliminate semantically incorrect 
translations. Therefore, the dual-anchor overlap technique can be used to confirm or 

10 eliminate a candidate word string translation identified by any cross-language association 
method of the present invention when that method alone has not reached a point of user- 
defined confirmation for a word string translation. For example, if a Source document is 
parsed only in segments of word strings with full overlap of all words of each word 
string, and the far left and far right word string translations are known to be accurate, no 

1 5 Target Language translation candidate will be accepted that is incorrect for either 
semantic or grammatical reasons. 

Moreover, once word string translation candidates are approved through large 
overlaps anchored by known word string translations, these newly confirmed word string 
units can be added to the database as known accurate translations. Additionally, word 

20 strings in the overlap across two languages of two known word string translations can be 
approved as an independent word string translation. 
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A. Document Translation through Use of an Association Database and 
Dual- Anchor Overlap Technique 

As another preferred embodiment, the present invention can translate a document 

5 in a first language into a document in a second language by using a cross-language 

database as described above. Entries may exist for a word string translation or can be 

built on-the-fly using any of the above methods to build word string translations across 

languages. 

One embodiment of this aspect of the present invention starts by locating the 

10 longest word strings that begin each sentence of the document to be translated (Source 
document) along with all of their potential translations that meet user-defined criteria 
using any of the above methods for identifying potential Target Language word string 
translations. Next, the method identifies a second word string for each of the sentences of 
the document to be translated (Source document), with a user-defined number of 

1 5 overlapping words with the previously identified word string, along with their potential 
translations (the user defines the length of the overlap (i.e., number of words) that is 
required). If a Target word string translation of the second identified word string of a 
sentence (in the Source Language) has a user-defined minimum overlap with one of the 
first word string translations of the sentence, the combination of translations are approved 

20 as a combined translation unit. If overlapping translations cannot be produced, different 
parsings of Source Language word strings (i.e., different start and/or end positions) with 
user-defined minimum overlaps are identified and their respective Target Language 
translations are tested for combination through an overlap of a word or user-defined sized 
word string. Next, a third word string in the Source Language that has a user-defined 

25 minimum number of overlapping words with the second identified word string in the 
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Source Language is identified along with its Target Language translations. If any of the 
translations of the third identified word string have overlapping words with the 
translation of the second identified word string, the combination is approved as a 
translation. The next Source Language word string that has a user-defined minimum 
5 number of overlapping words with the previously identified Source Language word string 
is identified and the process is repeated until: (1) each overlapping word string (with at 
least the minimum user-defined size overlap) from the Source Language document has 
been identified along with potential Target Language translations; (2) every word string 
in both the Source Language and Target Language has both a right and left overlapping 

10 word string of at least the user-defined minimum size (overlap can also be one word, if 
defined by the user), except the initial string which overlaps only on the right, and the 
final string which overlaps only on the left; and (3) the longest strings satisfying 
properties 1 and 2 above are the ones selected for the final output translation. 
Alternatively, shorter Target Language word strings (i.e., strings of fewer words) that 

1 5 have larger overlaps can be chosen over longer strings with less overlap, based on user- 
defined criteria. The tradeoff between overlap ratio and string length is a programmable 
parameter subject to manual or automated optimization. 

Since word string translations across languages have the appropriate built-in 
context for each word in a word string, and since the dual-anchor overlap technique 

20 provides accurate combinations of word string translations, documents can be translated 
with levels of accuracy far superior to any existing translation method. The present 
invention builds word string building block ideas using association database creation 
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techniques, and combines them into any number of larger combined ideas through the 
cross-language dual-anchor overlap technique. 

The cut-off point of a chain to be translated as a translation query unit string using 
the dual-anchor overlap technique is. user-defined (user definition of a translation query 
5 unit string in the above embodiment is a sentence). For instance, instead of a sentence, 
the concept can be broadened to require overlapping translations of word strings across 
both Source and Target Language for all contiguous word strings of a shorter unit (e.g., 
between punctuation marks) or a longer unit (e.g., a paragraph, including punctuation). 
Because both the beginning and the end of an overlapped unit will only have one side 

10 confirmed by overlap, user-defined criteria when building word string translations may 
be more stringent when accepting a first or last word string as a translation. Moreover, 
the aspect of the invention that identifies semantically equivalent word strings can be 
employed to confirm the translations of any word string (by providing additional checks 
of translations of Source and/or Target Language synonyms). 

1 5 For example, consider a database of Hebrew-English word and word string 

translations (built using any of the methods of the present invention or assembled 
manually) with the components of the following sentence entered in English and intended 
to be translated into Hebrew: "In addition to my need to be loved by all the girls in town, 
I always wanted to be known as the best player to ever play on the New York State 

20 basketball team". 

Through the process described above, the manipulation method might determine 
that the phrase "In addition to my need to be loved by all the girls" is the largest word 
string in the Source document beginning with the first word of the Source document and 
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existing in the database. It is associated in the database to a number of word strings 
including the Hebrew word string "benosaf Itzorech sheli lihiot ahuv al yeday kol 
habahurot". The process will then determine the following translations using the method 
described above - i.e., the largest English word string from the same text (and existing in 
5 the database) with one word (or alternatively, a minimum user-defined size word string) 
that overlaps with the previously identified English word string, and the two Hebrew 
translations for those overlapping English word strings which have overlapping segments 
as well. For example: 

"loved by all the girls in town" translates to "ahuv al yeday kol habahurot buir"; 
10 "the girls in town, I always wanted to be known" translates to "Habahurot buir, 

tamid ratzity lihiot yahua"; 

"I always wanted to be known as the best player" translates to "tamid ratzity lihiot 
yahua bettor hasahkan hachi tov"; and 

"the best player to ever play on the New York State basketball team" translates to 
15 "hasahkan hachi tov sh hay paam sihek bekvutzat hakadursal shel medinat new 

york". 

With these returns in the database, the manipulation will operate in a manner to 
compare overlapping words and word strings and eliminate redundancies. Utilizing the 
technique of the present invention, the system will take the English segments "In addition 
20 to my need to be loved by all the girls" and "loved by all the girls in town" and will 

return the Hebrew segments "benosaf Itzorech sheli lihiot ahuv al yeday kol habahurot" 
and "ahuv al yeday kol habahurot buir" and determine the overlap. 

In English, the phrases are: 
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t 

"In addition to my need to be loved by all the girls" and "loved by all the girls in town". 
| Removing the overlap yields: "In addition to my need to be loved by all the girls in 

town". 

In Hebrew, the phrases are: 
5 "benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot" and "ahuv al yeday kol 
habahurot buir". Removing the overlap yields: "benosaf ltzorech sheli lihiot ahuv al 
yeday kol habahurot buir". 

The present invention then operates on the next parsed segment to continue the 
process. In this example, the manipulation process works on the phrase "the girls in 

10 town, I always wanted to be known". The Hebrew corresponding word set is "habahurot 
buir, tamid ratzity lihiot yahua". Removing the overlap operates, in English, as follows: 
"In addition to my need to be loved by all the girls in town" and "the girls in town, I 
always wanted to be known" becomes "In addition to my need to be loved by all the girls 
in town, I always wanted to be known". 

15 In Hebrew, the overlap process operates as follows: 

"benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir" and "habahurot buir, 
tamid ratzity lihiot yahua" yields "benosaf ltzorech sheli lihiot ahuv al yeday kol 
habahurot buir, tamid ratzity lihiot yahua". 

The present invention continues this type of operation with the remaining words 

20 and word strings in the document to be translated. Thus, in an example of the preferred 
embodiment, the next English word strings are "In addition to my need to be loved by all 
the girls in town, I always wanted to be known" and "I always wanted to be known as the 
best player". Hebrew translations returned by the database for these phrases are: 
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"benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir, tamid ratzity lihiot yahua" 
and "tamid ratzity lihiot yahua bettor hasahkan hachi tov". Removing the English 
overlap yields: "In addition to my need to be loved by all the girls in town, I always 
wanted to be known as the best player". Removing the Hebrew overlap yields: 
5 "benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir, tamid ratzity lihiot yahua 
bettor hasahkan hachi tov". 

Continuing the process: the next word strings are "In addition to my need to be 
loved by all the girls in town, I always wanted to be known as the best player" and "the 
best player to ever play on the New York State basketball team". The corresponding 

10 Hebrew phrases are "benosaf ltzorech sheli lihiot ahuv al yeday kol habahurot buir, tamid 
ratzity lihiot yahua bettor hasahkan hachi tov" and "hasahkan hachi tov sh hay paam 
sihek bekvutzat hakadursal shel medinat new york". Removing the English overlap 
yields: "In addition to my need to be loved by all the girls in town, I always wanted to be 
known as the best player to ever play on the New York State basketball team". 

1 5 Removing the Hebrew overlap yields: "benosaf ltzorech sheli lihiot ahuv al yeday kol 
habahurot buir, tamid ratzity lihiot yahua bettor hasahkan hachi tov sh hay paam sihek 
bekvutzat hakadursal shel medinat new york", which is the translation of the text desired 
to be translated. 

Upon the completion of this process, the present invention operates to return and 
20 output the translated final text. 

It should be noted that the returns were the ultimate result of the database 
returning overlapping associations in accordance with the process described above. The 
system, through the process, will ultimately not accept a return in the second (Target) 
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language that does not have a naturally fitting connection, i.e., right and left overlaps with 
the contiguous language segments, with the exception of first and last segments, as 
described above. Had any Hebrew language return not had an exact overlap with a 
contiguous Hebrew word string association, it would have been rejected and replaced 
5 with the highest ranking Hebrew word string association for that English word string that 
overlaps with the contiguous Hebrew word strings, or alternative overlapping English 
word strings (shorter or longer) can be retrieved from the database with their Hebrew 
translations and tested for exact overlaps in Hebrew. 

Attached is Appendix B (page 253), which is a print-out from the present 
10 invention showing an example of translation using the dual-anchor overlap method in 
combination with Acquisition Using Parallel Text in Two States. 

Attached is Appendix C (page 297), which is a print-out from the present 
invention showing an example of translation using the dual-anchor overlap method with a 
combination of Acquisition Using Parallel Text in Two States and Acquisition Using 
15 Multiple-States. 

Attached is A ppendix D (page 308), which is a print-out from the present 
invention showing an example of translation using the dual-anchor overlap method in 
combination with Target Language Flooding. 

Various user-defined parameters can be established for overlap criteria. For 
20 example, the required number of words that overlap may be greater when one or more of 
the words in the overlap are stop words (e.g., "the", "it", "in") because these common 
words make unreliable connection points for the combination of word strings. The longer 
the overlapping string of words between a translation candidate and the two translations it 



104 



overlaps with, the less certain the word string translation needs to be. If the translation is 
incorrect, it will not have large overlaps with both of its neighboring translations. 

Therefore, user-defined minimum overlap requirements may be dynamic and 
require fewer or more overlapping words between parsed word string translations based 
on whether the translations are known to be correct or are just determined to be 
candidates based on the different methods of the present invention for building word 
string associations. Moreover, the minimum number of words required in the overlap for 
approval of a translation may ignore overlapping stop words for satisfying this 
requirement. 

For example, assume the user-defined requirements called for two or more 
overlapping non-stop words to approve the combination of two word string translations, 
and the overlapping parsed word strings "and I know it is good", "it is good to run two 
miles" are presented to the system as part of a longer string of words to be translated. 
This parsing would not be accepted by the system because the overlapping word string "it 
is good" does not have two non-stop words and therefore does not fulfill the user-defined 
overlap requirement. The word strings will need a larger number of words between the 
segments to satisfy the requirement and then test the respective Target Language 
translations for overlap (e.g., "and I know it is good" and "know it is good to run"). 

If word string translation candidates identified by any method of the present 
invention, any other automatic translation method, or created manually, are not certain to 
be accurate, the dual-anchor overlap technique can require that all word strings (except 
first and last word strings) must have every word of the string be overlapped by either the 
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left or right contiguous word string translations. For example, one possible parsing for 
"complete overlap" for a word string to be translated could be as follows: 

Source Language (English) Translation Query: "The best time of the year is the 
summer because you can sit in the sun and then jump in the pool". 

5 

One Possible Complete Overlap Parsing: 

"the best time of the year" 

"time of the year is the summer because you" 

"year is the summer because you can sit in the sun" 
1 0 "because you can sit in the sun and then 

"sun and then jump in" 

"jump in the pool" 

An even more comprehensive scheme would be to only move one word forward 
with each consecutive word string overlap when parsing a Source Language translation 
15 query into overlapping word strings. For example: 
"the best time of 
"best time of year" 
"time of year is" 

"of year is the" 
20 "year is the summer" 

The process started above could be continued until each word of the translation 
query was parsed with maximum overlap. 

Because the word strings are overlapped completely on both left and right sides 
(except for first and last word strings which only have some additional confirmation 
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through one-sided overlap) the translation candidates for them will not be accepted if 
incorrect (or correct but for a different surrounding context). The first word string on the 
left should be independently confirmed by one of the association methods of the present 
invention (or manually) as an accurate translation (at least on the un-overlapped left side 
5 of the word string) and the last word string at the end of the sentence should be 

independently confirmed as an accurate translation (at least on the un-overlapped right 
side). In the above example, either both word strings "the best time of the" and "jump in 
the pool" should be confirmed independently as accurate translations or at least their left 
and right sides, respectively. These confirmed translations give accurate end points to 
10 anchor the chain of overlapping word string translation candidates. 

The same overlap technique applies for connection of word strings to form larger 
word strings with integrity for applications using a single state or language as described 
later. 

15 B. Knowledge Acquisition Using Dual- Anchor Overlap 

Moreover, each time two confirmed translations with overlapping word strings 
are combined, two additional database entries for cross-language translation of word 
strings can be approved and added to the database based on the results of the overlap. 
First, the total combined overlapping translation can be approved as one overall unit for 
20 future use. Second, the unit of overlapping words in both Source Language and Target 
Language constitutes a word string translation by the present invention and can be added 
to the database for future use. 
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For example, assume a cross-language database with the following Language X 
word strings and corresponding known Language Y translations: 



Language X Word String Language Y Translations 

5 1 . "EE KK GG XX" la. "11 bb ee" 

lb. "eekkggxx" 



2. "GG XX BB YY" 2a. "gg 11 bb yy" 

2b."ggxxbbyy" 

10 2c. "gg xx mm 11" 



Based on the above database entries, the following additional database entries can 
be approved and entered as valid translations: 

15 3. "EE KK GG XX BB YY" 3a. "ee kk gg xx bb yy" 

3b. "ee kk gg xx mm 11" 

4."GGXX" 4a."ggxx" 



20 Entry number 3 is the combined word string translations after eliminating 

overlapping words in Source Language and Target Language. Number 4 is the 
overlapping word strings in both Source Language and Target Language, which confirms 
the smaller word string in the overlap as an independent word string translation. 
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Translation candidates that are not confirmed as accurate translations on a Query 
String Flooding List using the Target Language Flooding technique (or using any other 
method) can be tested for large overlapping word strings in both Source Language and 
Target Language. If overlapping word string translation candidates are linked together 
5 through large overlaps and are overlapped with known word string translations at the 
beginning and end of a larger translation unit, the translation candidates as well as the 
word strings in each of the respective overlaps across the two languages can be approved 
as translations. The above technique of identifying translations in overlapping word 
strings can be used to expand any cross-language database by leveraging the existing 
10 translations that overlap across two languages, generated automatically or manually 
assembled for use by EBMT systems, Translation Memory systems or for any other 
purpose. 

C. Other Related Applications 

15 

The above embodiment combining the use of a cross-language association 
database and the cross-language dual-anchor overlap translation technique has clear 
applicability to improve the quality of existing technologies that attempt to equate 
information from one state to another, such as voice recognition software and optical 
20 character recognition (OCR) scanning devices that are known in the art, to correlate 

information across multiple sources, and to translate among different jargons or dialects 
within one language. These technologies (as well as others) can use the present invention 
to test the results (output) of their systems using the translation methods of the present 
invention to see if the results can be translated. When a translation that overlaps with its 
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neighbors cannot be found, the user can be alerted and queried or the system can be 
programmed to look for close alternatives in the database to the un-overlapped portion of 
the translation. Various criteria for finding alternative word strings that overlap with its 
neighbors include those based on context using the embodiment of the association 
5 database that produces semantic equivalents within a language (described later). All 
returns to the user, of course, would be converted back into the original language. 

In addition to aiding existing technologies that perform these applications, the 
methods of the present invention, including the cross-state learning and dual-anchor 
overlap technique, can also be applied directly to build these applications. For OCR, 

1 0 visual representations of letters and words would be used to build associations between 
the visual representation of words and word strings and the computer encoding such as 
UTF-8 and other computer languages and protocols. Text that teaches the use of 
computer languages can be set up to align the text description of a command with the 
computer language code that describes those commands as training text to build 

1 5 associations between human languages and computer languages. Written descriptions of 
code and computer code can also be used as a Parallel Text corpus for association 
building using the methods of the present invention. For voice recognition, the sound 
waves and written text would be analyzed to make the associations between the common 
ideas represented in two different states (using word strings of a written text along with 

20 the audio sound waves associated with the text as "Parallel Text" to train the system) as 
described later. 
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IV. SINGLE STATE FREQUENCY ASSOCIATION DATABASE CREATION 
AND COMMON FREQUENCY ANALYSIS METHOD AND APPARATUS 

A. Introduction 

5 Another embodiment of the present invention provides (1) a method and 

apparatus for creating a Frequency Association Database ("FAD") of ideas represented 
by words and word strings within a single language (e.g., Japanese or English) and (2) a 
method and system for using the FAD to identify common relationships between and 
among two or more words and/or word strings. This second method and system, referred 

10 to as Common Frequency Analysis (CFA), can be used to generate lists of related ideas 
for use in various applications. 

In this embodiment, the FAD, once created, stores information about the 
proximity relationship in text between and among two or more recurring word string 
patterns. These proximity relationships, once established and stored through the first 

15 process, provide the basis for the second process, CFA, which is the analysis and 

identification of third word or word string associations shared in common by two or more 
words and/or word strings. This CFA process provides the basis of various knowledge 
acquisition and knowledge generation applications. 

A frequency association program can embody some of the methods of the present 

20 invention and can be used to build the databases of the present invention and to analyze 
the information stored in the databases to determine associations between words and/or 
word strings. Figures 2 and 3 depict memory 208 of the computer system 200 in which 
are stored a smart application 302, an association program 304, databases 306 and an 
operating system 308, for access by processor 202. The association program 304 can be 



111 



an independent program or can form an integral part of a smart application 302. The 
association program 304 can analyze the databases 306 to determine word associations 
either in response to a query from a smart application 302, or in response to a query 
directly submitted by the user via the input device. The databases 306 can include, for 
5 example, FAD and document databases 

The FAD system and method operates by parsing the text of all documents that 
are input into the system and storing information regarding which of the parsed segments 
of text are associated with one another based on the frequency of occurrence and position 
of a particular segment with respect to other segments of the document. As always, 
10 segments of parsed text can include words and word strings, or characters and strings of 
characters for languages that use characters that possess independent semantic value (e.g., 
a Chinese character). Prior to being operated on by the FAD system, the documents can 
be stored in a Document Database to facilitate access, parsing, and analysis of the 
documents. 

15 Words and word strings that frequently appear in close proximity to each other 

within a document are identified by the present invention through FAD analysis of words 
and word strings within user-defined ranges of one another. These associated words and 
word strings can be used by the second process, CFA, to identify ideas or concepts (in the 
present embodiment represented by these words or word strings) that have strong 

20 relationships to one another based on common relationships to other third ideas and 
concepts (also represented here by words and word strings). 

The CFA process operates on these associated word strings stored in the FAD to 
create a knowledgebase comprised of lists of related ideas. In one embodiment of the 
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present invention, these lists of related ideas (represented in this embodiment by words 
and word strings) are referred to interchangeably as Knowledge Acquisition Lists or 
Semantic Equivalent Lists. Using this embodiment of CFA, the system generates a list 
for a query word or word string by identifying word strings in certain patterns around or 
5 near the query referred to as "Left or Right Signatures," or when combined, "Cradles," 
that are shared by third words and/or word strings. The results generated for a particular 
word or word string query identify closely related ideas which include semantic 
equivalents of the word or word string, as well as opposite ideas, examples of the idea, 
and other related ideas represented by words and word strings. These Signatures, 

10 Cradles, and Knowledge Acquisition Lists, once built, form a knowledgebase in each 
language that can be used in machine translation applications, search and text mining 
applications, data compression, and many other applications including artificial 
intelligence or smart applications that allow a user to ask the system to learn, and/or 
provide answers to questions, or perform actions. 

15 Using the FAD of the present invention to provide the input for CFA, the system 

can determine common third word and/or word string associations between or among two 
or more words or word strings. When conducting FAD, the user can define the ranges to 
be examined in the documents as any number of words and/or word strings of user- 
defined size in proximity to each occurrence of each selected word or word string. 

20 Once these word and word string relationships are built and stored in the FAD, 

the system based on instructions from smart application 302 (see Figure 3 ) will then 
perform one or more CFAs that search for words and/or word strings that are common to 
the ranges of the two or more words and/or word strings selected by smart application 
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302. When the system conducts a CFA, the frequency of occurrence of words or word 
strings within the ranges of each selected word or word string can be retrieved if 
previously stored in an FAD (or any information not previously analyzed and stored in 
the FAD can be analyzed on-the-fly using text in the Document Database or any other 
5 available text including text on the Internet). 

Creating an FAD in a single state is similar to creating a cross-language FAD 
used using Parallel Text to identify word string translations, as described previously. In 
that case, the range was established in the Target Language documents and recurring 
words and word strings were counted to establish frequency of occurrence in the range. 

10 When creating an FAD in a single language or state, the principle is the same but the 
frequency and proximity of word strings is used to establish the patterns of context for 
words and word strings in the single language or state, and not translations of words and 
word strings across languages. 

An alternative to building out an FAD that documents every recurring word or 

15 word string proximity relationship is to identify the locations and frequency of 

occurrence of words and word strings recurring in the Document Database and storing 
them in a simpler Recurrence Database to establish a word string frequency index, an 
example of which is shown in Table 4. Using a Recurrence Database as a word string 
frequency index instead of an FAD, the association program 304 can identify all the same 

20 word string patterns and establish the highest ranked third word and word string 

relationships shared by the two or more words and/or word strings selected by the smart 
application 302 (see Figure 3 ), based on user-defined weighting or other criteria. 
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B. Frequency Association Database (FAD) Creation 
1. In General 

Disclosed is a method for building an FAD that can be applied to documents in a 
single language for purposes of building a database of related words and word strings 
5 based on their frequency and proximity to one another in the text. FADs provide the 
building blocks to be used for CFA of the present invention. The method includes: 



a. Assembling a corpus of text in a single language (can be stored in a 
Document Database) 

b. Searching for all multiple occurrences of any word or word string in the 
1 0 assembled corpus. 

c. Establishing a user-defined number of words and/or word strings of 



user-defined length on either (or both) side(s) of the word or word string 
being analyzed. This will serve as the range. In addition to being 
defined as a certain number of words, the range may be defined broadly 

15 (e.g., all words in the specific text in which the word or word string 

occurs) or narrowly (e.g., a specific size word string (i.e., number of 
words) in an exact proximity to the analyzed word or word string), as 
the user may determine for the specific application, 
d. Searching the corpus and determining the frequency with which each and 

20 every word and word string appears in the ranges around the selected 

word or word string being analyzed and, if desired, their proximity to the 
selected word or word string. 
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I 

I 

If the range is defined as including, for example, up to 30 words on either side, 
the system will record the frequency of occurrence of every word and word string within 
30 words of each of these words or word strings. If the range is defined as three-word 
word strings to the right of a query word or word string, and four-word word strings to 
5 the left of the query, only the three-word word strings to the right and the four-word word 
strings to the left of the query will be registered for recurrence of this pattern. The 
system can note the proximity of each word or word string to the word or word string 
being analyzed. 

As described above, for certain applications the system can be instructed to 
10 recognize and disregard common words such as "I", "a", "to", etc. However, those 
common words may be considered based on the goal of the specific application for the 
system. Thus, the FAD can be built based on frequencies of words and word strings 
appearing exactly a user-defined number of words away, to either the left or the right, 
from the word or word string being analyzed. In such cases the range could be defined 
1 5 narrowly by the user for an application as one word or one word string of a specific size 
in an exact proximity to the word or word string being analyzed. 

For instance, the system can analyze the documents available to determine that 
they include the phrase "go to the game" 10,000 times and it may find "go to the game" 
within a 20 word range of the word "Jets" 87 times. In addition, the system may 
20 determine that "go to the game" appeared exactly seven words in front (in English to the 
left, in a language that reads right to left, like Hebrew, to the right) of the word "Jets" 
eight times (counting from the first word "go" of the word string). 
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Any combination of recurring patterns of words and word strings based on the 
number of words between them can also be recorded. For instance, the database can 
record the number of sentences in the database in which the word "Jets" appears three 
words before "go to the game" and when "tickets" appears nine words after "go to the 
5 game." That pattern may occur three times and the frequency of that word pattern in the 
text may be used by an application that will deduce the meaning of an idea to either help 
provide an answer to a question asked by the user, or help carry out a request made by the 
user. 

It is known in the art that "search" of words or word strings based on user-defined 
10 proximity exists for search applications that use the results of the user-defined search 
parameters to present documents to the user that contain those search terms based on 
proximity requirements. Search methods do not however use an application to 
automatically search these parameters (based on, for instance, frequency in text) and do 
not store this information to be used by the system to automatically acquire or learn 
15 knowledge based on further automatic steps of an application. 

These FADs of the present invention indicating exact recurring word string 
patterns in text based on their proximity to each other measured by the number of words 
between them can be generated individually using a series of narrowly defined ranges. 
Typically, however, the most frequently useful word and word string patterns are those 
20 contiguous to or generally in close proximity to (on the left and right of) the word or 
word string being examined. 

2. FAD Utilizing an Index of Recurring Word Strings 
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A large number of calculations would be required if the above method were used 
to build a database of every proximity and frequency relationships between all recurring 
word patterns in the available text as described above. Many relationships being built as 
a result of this comprehensive process might never be used for an application. The 
5 following technique involves indexing recurring word strings to avoid upfront processing 
that may never be used to establish exact relationships. 

In addition, the following indexing process can be used as an alternate process to 
the method described above for automatically determining frequency and proximity 
associations, and to perform general range frequency analysis and an analysis of exact 

10 patterns based on specific word or word string locations within a range as described 
above. This embodiment of the invention is a method for building the Recurrence 
Database, which only includes the location of each recurring word and word string in the 
Document Database and not its proximity to other entries. This method is as follows: 
first, search for all words and word strings for recurrences in the available text; second, 

15 record in the database the "locations" for each word and word string with multiple 

occurrences by noting its position within each document in which it occurs, for example, 
by identifying the word number of the first word in the string, along with the document 
number in the Document Database. Alternatively, just the document number of the 
document in the Document Database in which the word or word string is located can be 

20 stored. In this case, the position of the word or word string can be searched and 
determined on-the-fly when responding to a specific query. 

Table 4 is an example of entries in the Recurrence Database. 
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Table 4 



Word or Word String 


Frequency and Location 


"kids love a warm hug" 


20 times (word 58/docl678; word 45/doc 560; 
word 187/doc 45,231; word 689/doc 123; ....) 


"kids love ice cream" 


873 times (word 765/doc 129; word 231/doc 
764,907; word 652/doc 4,501; ... ); 


"kids love a warm hug before going to bed" 


12 times (word 58/doc 1678; word 45/doc 560; 
word 187/doc 45,231; ...) 


"kids love ice cream before going to bed" 


10 times (word 765/doc 129; word 231/doc 
764,907; ...) 


"kids love staying up late before going to bed" 


17 times (word 23/doc 561; word 431/doc 
76,431; ...) 


"before going to bed" 


684 times (word 188/doc 28; word 50/doc 
560;word 769/ doc 129; word 436/doc 76,431; 
...) 



As indicated, each occurrence of a word or word string found more than once in 
5 the Document Database will be added to the frequency count and its location noted by 
designating the word number position in a document as well as the number assigned to 
identify the document in which it occurs, or by using any other identifier of the word or 
word string's location in the Document Database. 
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If the Recurrence Database is fully and completely generated (including word 
number positions as well as document numbers) for all documents in the Document 
Database, the location information allows the system to calculate any general frequency 
relationships, or any specific word string pattern frequency relationships as described 
5 above. Until the Recurrence Database is fully built, the system will perform the FAD on 
two or more ranges in the documents in the Document Database on-the-fly after 
identifying the locations from the Recurrence Database or performing a general search of 
a word string in the Document Database on-the-fly using any search technique known in 
the art. Any word or word string recurrence not yet in the Recurrence Database can be 

10 added at the time the system responds to a query involving it by analyzing documents in 
the Document Database directly to supplement analysis of the Recurrence Database. 
After the information obtained by direct analysis of the documents in the Document 
Database has been used for the specific task for which it was generated, the information 
can then be stored in the Recurrence Database for any future use. Whether the system 

15 builds an FAD analysis using the Recurrence Database, or whether those relationships are 
created on-the-fly by searching documents with the query as a keyword, the system will 
identify relationships between any recurring ideas represented by words or word strings. 

C. Common Frequency Analysis - Knowledgebase Acquisition and 
20 Generation by Association Method and Apparatus 

Common Frequency Analysis (CFA) is a technique of the present invention that 
generates lists of ideas (represented here by words and word strings) that have common 
relationships with the two or more ideas (words and/or word strings) being analyzed. 
25 Several different embodiments of CFA can be used to generate different types of 
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Knowledge Acquisition Lists of related ideas. These lists can be used in a number of 
applications, including smart applications, which conduct additional analysis using other 
embodiments of CFA to retrieve or learn new information to aid in answering a question 
or perform a task. 

5 Referring now to Figure 3 , in a CFA process, smart application 302 can query the 

Frequency Association Database or the Recurrence Database, via the association program 
304, with two or more words and/or word strings to identify what third words and/or 
word strings are frequently associated within user-defined ranges with some or all of the 
presented words and/or word strings. In another embodiment of the CFA aspect of the 

10 present invention, the system, when furnished with a word or word string query (from, 
for example, the user or smart application 302) identifies two or more words and/or word 
strings using two or more FAD entries for the query to make associations between the 
two or more identified words and/or word strings. This type of CFA is used to identify 
word string Signatures and Cradles as part of the process for Knowledge Acquisition List 

15 generation to identify semantic equivalents and other relationships between words and/or 
word strings (as described later). 

There are two different methods of performing CFA (1) Independent Common 
Frequency Analysis (ICFA), and (2) Related Common Frequency Analysis (RCFA). 
Additionally, after employing either of the two processes, the system can do further 

20 statistical analysis by employing them in an additional generation or generations, or by 
combining the results and/or segments of any CFA for further CFAs. 
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1. Independent Common Frequency Analysis (ICFA) 

When the smart application 302 presents the association program 304 (see Figure 
3) with two or more words and/or word strings for CFA, the system can identify all 
words and word strings frequently related to the presented words and/or word strings by 
referring to an FAD of the present invention. The system can then identify those words 
and/or word strings that are frequently associated to some or all of the two or more 
presented words and/or word strings based on user-defined criteria. 

The system can rank the common associations it identifies for the presented words 
and/or word strings in a variety of user-defined ways. For example, the system can rank 
the associations by adding (or multiplying or any other user-defined weighting) the 
frequencies for the common word or word string associations to each of the presented 
words and/or word strings. As another example of a user-defined parameter, a minimum 
frequency (as measured by position on the list, raw number of occurrences or any other 
measure) on all tables of presented words and/or word strings may be required. 

For example, using entries in the Recurrence Database above, if the task was 
looking for ideas common to the word strings "kids love" and "before going to bed", the 
system would calculate the frequency with which third concepts like "ice cream" are 
within a user-defined range in all available documents with the first concept "kids love" 
as one analysis, and the frequency with which "ice cream" and the second concept 
"before going to bed" appear together as the second analysis. The frequency of each of 
the independent relationships can then be used by an application that will give relative 
value to each. This will be based on how high (user-defined as either absolutely or 
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relatively) the frequency of "ice cream" ranks on both the "kids love" frequency table and 
the "before going to bed" frequency table (based on user-defined ranges). 

Based on user-defined values, this method, after analyzing "ice cream" might then 
identify "a warm hug" by locating it on the "kids love" frequency table (based on the 
5 user-defined range or proximity requirements of the application) for relative frequency 
and then locate "a warm hug" on the "before going to bed" frequency table. All other 
frequent associations (which may be user-defined) on both frequency tables will be 
compared, for example "staying up late", and scored based on user-defined values of 
combined relative frequencies from both tables. The highest-ranking word string, based 
10 on user-defined weighting of each frequency association, will be produced by the system. 

The result of this analysis may be that the system will be able to deduce that, 
while "kids love" "ice cream" more than "kids love" "warm hugs," "kids love" "warm 
hugs" more than "kids love" "ice cream" "before going to bed". 

15 2. Related Common Frequency Analysis (RCFA) 

In addition to finding common word and word string associations that each 
queried word or word string has independently, another embodiment may look to identify 
frequent appearances of words and or word strings that are found in user-defined ranges 
in only those documents containing two or more of the words and/or word strings being 
20 analyzed. A Related Common Frequency Analysis is different than an Independent 
Common Frequency Analysis in that related words and/or word strings being analyzed 
for RCFA appear together in a user-defined range of a document as opposed to appearing 
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independently for analysis. The embodiment of an RCFA according to the present 

invention employs the following steps: 

First, locate all documents from the available corpus that contain two or more of 

the presented words and/or word strings. For example, if documents are stored in a 
5 Document Database, they could be located by returning specific document numbers 

representing documents that contain two or more of the presented words and/or word 

strings. The document numbers are those numbers designated by an indexing scheme 

known in the art or described in the present application. 

Then, identify and compare each word and word string in a user-defined range in 
10 proximity to the presented words and/or word strings, and record the frequency for any 

words and word strings in the ranges. Once again, the user-defined range can be narrow 

and include only recurring words or word strings in a specific proximity (such as 

contiguous) to the presented words or word strings. 

As an example, assume the system is presented with the two word strings "kids 
15 love" and "before going to bed" for analysis under RCFA. Further assume that a 

Recurrence Database contains the following entries: 

"kids love a warm hug" 20 times 

"kids love ice cream" 873 times 

"kids love a warm hug before going to bed" 12 times 

20 "kids love ice cream before going to bed" 10 times 

"kids love staying up late before going to bed" 17 times 
"before going to bed" 684 times 
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When conducting an RCFA using two words and/or word strings for analysis, a 
Recurrence Database will direct the system to the documents in the Document Database 
that have both segments (e.g., "kids love" and "before going to bed") as they will have 
the same document number associated with them. Usually, the system will locate only 
5 those documents where the word strings are within a user-defined number of words of 
each other or in any other user-defined qualifying proximity to one another. 

Once the system has identified all documents in the Document Database that 
contain "kids love" within the designated proximity to "before going to bed", the system 
builds a frequency chart of all recurring words and word strings within a user-defined 

10 range around the two presented word strings. In the example based on the limited amount 
of text in the database (and assuming the user-defined range requires words and word 
strings to be adjacent to the words or word strings being analyzed), "ice cream" occurs 10 
times in the range of the two presented phrases and thus has a frequency of 10, "staying 
up late" occurs 17 times in the range of the two presented phrases and thus has a 17 

15 frequency, and "a warm hug" occurs 12 times in the range of the two presented phrases 
and thus has a 12 frequency. 

If the range relative to the two RCFA word strings is expanded the existing 
Recurrence Database may well have other word strings that will add to the above 
frequency counts depending on the user-defined range of word strings. For instance, 

20 there may be recurring words and word strings in the same text near "kids love" and 
"before going to bed" but not directly adjacent to them (e.g., "kids love ice cream and 
other sweets before going to bed"). This also means that if the phrase, "ice cream and 
other sweets" repeats, it will also be an independent answer to the query as well. The 
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aspect of the present invention that identifies semantic equivalent terms may also group 
the returns "ice cream" and "ice cream and other sweets" as a single semantic category 
(e.g., sweets) in an application (based on user-defined criteria). In addition, the order of 
the ideas may be different but the meaning be the same (e.g., "before going to bed, kids 
5 love ice cream") which will be desirable to add to the analysis. The aspect of the 
invention that identifies semantically similar concepts (in combination with the dual- 
anchor overlap technique) will enable different concept order with the same meaning to 
be identified as semantically equivalent. 

Furthermore, known or determined semantic equivalents can be used in place of 

10 the searched words and word strings (using RCFA or ICFA) to find recurring words and 
word strings around the equivalent's ranges as alternative embodiments of the invention. 
For instance, the system can also search "kids like", "kids really love", "kids enjoy", 
"children enjoy", or "children love" in place of "kids love". The same technique can be 
used to replace "before going to bed" with known equivalents to the system like "before 

15 bed", "before going to sleep", or "before bedtime". 

Both the word order issue and the semantic alternative issue just described are 
addressed by the present invention's ability to detect word string patterns. As described 
later, the common frequency techniques of the present invention will yield a large 
number of semantically equivalent words and/or word strings that can be used to expand 

20 the analysis with many more relevant semantic search terms. Additionally, as explained 
later, the present invention can also recognize ideas that are ordered differently but are 
identical in meaning (e.g., "the boy and the spotted dog" and "the dog with the spots and 
the boy" would be recognized as equivalent semantic larger units by identifying patterns 
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of common classes of word strings that appear in patterns of common larger general 
groups together. Both the method to determine semantically equivalent ideas and the 
method to recognize semantically equivalent larger ideas whose component building 
block ideas are arranged in different orders are additional aspects of the present 
5 invention's knowledge acquisition ability to understand natural language. 

3. Second Level Frequency Analysis (RCFA or IFCA) 

In another embodiment, the system may perform CFA on either or both of the 
first or second word or word string that made up the query, and a selected third word or 

10 word string identified in the CFA (i.e., a returned result), which will add new information 
to the analysis performed for an application. For example, if the selected common 
association based on the frequency of all words and word strings within the common 
range of "before going to bed" (first) and "kids love" (second) is "ice cream" (third), this 
embodiment generates either an RCFA or ICFA between either "before going to bed" 

15 (first) and "ice cream" (third), or "kids love" (second) and "ice cream" (third), and 

selecting associations based on those two frequency analyses. For example, "ice cream" 
and "before going to bed" may have a high common frequency association with "stomach 
ache" which may be useful in the analysis for an application to be used according to the 
present invention. Moreover, any two or more words and/or word strings can be 

20 analyzed using the same techniques in as many combinations or as many generations as 
the user or the smart application defines. Specific applications will call for automated 
analysis identifying which CFA to perform on each generation of association frequency 
analysis based on each successive CFA result. More complex applications will identify 
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two or more frequency analyses to be performed before the two or more independent 
results are used in combination. 

V. SINGLE STATE KNOWLEDGE AQUISTION USING CFA 

5 Words and/or word strings in a language that represent the same idea can be 

identified as part of the same semantic family based on the pattern of word strings that 
frequently appear around them in language. These patterns become apparent by looking 
at the frequency with which specific words and word strings are found immediately prior 
to a particular word or word string (in English, to the left of the particular word or word 

10 string) as well as following the particular word or word string (in English, to the right of 
the particular word or word string). Thus, the Knowledge Acquisition List generation 
aspect of the present invention uses two specific CFAs designed to leverage the fact that 
words and word strings representing ideas that are alike (or share some other semantic 
relationship) will have commonality in the type and order of the words and word strings 

1 5 frequently leading into and away from them. 

Using RCFA or ICFA in this embodiment to create Knowledge Acquisition Lists, 
the system can generate a comprehensive word and word string database of highly related 
ideas based on frequently shared word strings to both the right and the left of the related 
ideas. The most highly related words and word strings (i.e., those sharing the same 

20 frequent left and right context word strings) are usually semantically equivalent, although 
other related information may rank high as well. Other related ideas include opposites 
(e.g., if the query is "hard" the return "soft" may rank high); related ideas by broad class 
(e.g., if the query is "dark blue" the return "orange" may rank high); examples (e.g., if the 
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query is "massive fraud" the return "skewing documents and misrepresenting data" may 
rank high); and other related knowledge. 

If, for example, the system is asked to identify words and/or word strings that 
have the same or almost the same meaning as another word or word string (i.e., the words 
5 and word strings are semantically similar (or synonymous)), the system can perform a 
first CFA to find the words and word strings frequently to the left and right of the query, 
and then perform a second CFA to identify all other words and word strings in that 
language that most closely share the same left and right context word strings. Typically 
the more similar the formations of left and right context word strings shared by two 

10 different words and/or word strings, the more similar in meaning they are. While 

opposites will share high frequency common associations, they will diverge strongly on 
certain important context associations that create an "opposites Signature" pattern that the 
system can identify to either filter out the word and word string opposites of the query, or 
provide a list of opposites for use in other applications. 

1 5 The character of the association between any idea represented by a word or word 

string and any other idea represented by a word or word string will be defined by the 
relationship between their respective sets of Signatures identified by the system. The 
system uses the association databases to detect frequently recurring specific word 
formations within user-defined ranges tailored to detect word patterns surrounding an 

20 idea that defines the relationship between the idea and other ideas. Thus, Right and Left 
Signatures (or Cradles when using RCFA) of a word or word string consist of all the 
contexts represented by various surrounding word strings in which that word or word 
string occurs. Taking the most frequent right and left context word strings and finding 
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what other word strings occur frequently between those very same Signatures identifies 
synonymous or near synonymous or other highly related phrases (word strings) and/or 
words. 

Other word strings that have a semantic relationship also share common left and 
5 right context word strings. Members of the same general class like places, colors, names, 
numbers, dates, sports, etc, have many common context word strings that the system can 
use to identify them. Other relationships like words and word strings representing 
examples of the query word or word string, or word strings representing other related 
facts to a query will share certain common context word strings that will be identified by 

10 the CFA aspect of the present invention, and those certain common context word strings 
define that particular relationship. 

The character of each of the relationships is defined by the shared context word 
strings along with the context word strings that are not shared. The user gives the system 
examples of words and/or word strings that define a relationship, and the method and 

15 system for word string Cradle and Signature sorting is used. Other methods of the 
present invention that help identify semantic equivalents on a Knowledge Acquisition 
List include (1) the method to determine the direct mutual relationship two word strings 
have on each other's Knowledge Acquisition Lists, (2) the method to determine the 
different Knowledge Acquisition Lists that two words and/or word strings both appear 

20 on, and (3) a method that generates synonymous expressions of a query plus Left 
Signature and query plus Right Signature and tests them for overlap. 

A general explanation of how, using the association databases and a smart 
application 302 (see Figure 3 ), the system detects semantically equivalent word strings 
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and other related knowledge through CFA will be described. The system can also run 
ICFA and RCFA on the presented words and word strings and combine the results using 
a user-defined weighting process. The Knowledge Acquisition List filtering and sorting 
methods of the present invention are then described. 

5 

A. Knowledge Acquisition List Generation Using ICFA 

One embodiment using a specific word formation around a word or word string to 
perform ICFA will identify words and/or word strings that are equivalents or near 
equivalents in semantic value (i.e., meaning) as well as other related words and word 

10 strings to any queried word or word string. This embodiment involves: Step 1, receiving 
a query consisting of a word or word string (the query phrase) to be analyzed, and (using 
the FAD aspect of the present invention) returning a user-defined number of words 
and/or word strings (the returned phrases) of a user-defined minimum and maximum size 
that occur with the highest frequency where the returned phrase is located directly to the 

15 left of the query phrase in all available documents. The larger the recurring user-defined 
word string, typically, the more precise (specific) the ultimate results will be. Step 2, 
produce an FAD analysis on each of a user-defined number of the top ranked results from 
Step 1 using a range of one word or a word string to the right of each word or word string 
analyzed (the system will rank by frequency of occurrence the recurring words and word 

20 strings to the right of each of the words or word strings returned in Step 1 and analyzed in 
Step 2). The frequencies of all identical words and word strings produced in Step 2 are 
then added. Step 3, producing an FAD analysis on the query and returning a user-defined 
number of words and/or word strings (the returned phrases) of a user-defined minimum 
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and maximum size that occur with the highest frequency directly to the right side of the 
query (again, word strings of at least two or more words is typically desirable for 
accuracy). Step 4, produce an FAD analysis on each of a user-defined number of the top 
ranked words and word strings returned from Step 3 using a range of one word or a word 
5 string directly to the left of each of the words and word strings being analyzed. Again, the 
results will be ranked by the frequency of occurrence of the words and word strings 
leading into each word and word string returned in Step 3 and analyzed in Step 4. The 
frequencies of all common word and word string results in Step 4 are then added. Step 5, 
identify all words and/or word strings that are produced by both Steps 2 and 4. In one 

10 embodiment, the frequency number of each of the words and word strings returned in 
Step 2 are multiplied by the frequency numbers of the words and/or word strings 
produced in Step 4. The highest ranking words and/or word strings (based on the 
products of their frequencies from Step 2 and Step 4 results) will typically be the words 
and word strings most semantically equivalent to the query. The list produced by this 

1 5 process is referred to as a Knowledge Acquisition List. 

As an alternative embodiment, in Step 5, the returns from Step 2 and Step 4 can 
be ranked based on the total number of different word string returns from Step 1 and Step 
3 that they share with the query. 

The combined process of Step 1 and Step 3 is an embodiment of ICFA where a 

20 single word or word string is used to independently identify groups of two different 
words and/or word strings related to the query. The combination of Step 2, Step 4 and 
Step 5 are another embodiment of ICFA where two words and/or word strings are used to 
identify common associated third words and/or word strings. 
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The following examples illustrate these embodiments using a hypothetical 
database to create associations among words and word strings from the system's 
Document Database, and then create associations using ICFA. Assume the word 
"detained" is entered by the user to determine all of the word and word string equivalents 
5 known to the system for that word (along with other related words and word strings). 

In Step 1, taking only the top three results to simplify the illustration (although the 
number of results analyzed by the present invention would typically be much larger and 
is user-defined), the system first determines the most frequent three-word word strings 
directly to the left of "detained". The length of the word strings directly to the left of the 
10 analyzed word ("detained") can be one size or a range of sizes and is user-defined (in this 
example three-word word strings). The result of this analysis - the list of word strings of 
a user-defined length to the left of the presented word - is called the "Left Signature 
List." Assume that the system in the above example returns the following: 

1 . "the suspect was " 

15 2. "was arrested and " 

3. "continued to be " 

In Step 2, the system operates on the returned Left Signature List. The system 
locates words and/or word strings that most frequently follow the above three returned 
three-word word strings - i.e., those words and/or word strings to the right of the returned 
20 members of the Left Signature List. The length of the word strings that the system 

returns in this operation is user-defined or can be unrestricted. The results of this analysis 
- each list of words and/or word strings to the right of each Left Signature List entry - is 
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called a "Left Anchor List." Assume that the system in the above example returns the 
following Left Anchor Lists: 



Left Signature List 

1 . "the suspect was " 



Left Anchor List 

a. "arrested" (240 freq.) 

b. "held" (120) 

c. "released" (90) 



2. "was arrested and " a. "held" (250) 

b. "convicted" (150) 
10 c. "released" (100) 



3. "continued to be " a. "healthy" (200) 

b. "confident" (150) 

c. "optimistic" (120) 

15 

Also in Step 2, the frequencies of identical returns across the Left Anchor Lists can be 
added. The only common returns in the Left Anchor Lists are: 

a. "held" 120 + 250 = 370 

b. "released" 90 + 100=190 

20 In Step 3, the system determines the three most frequently occurring two-word 

word strings directly to the right of the selected query "detained" in the documents in the 
database. Again, the number of frequently occurring word strings analyzed is user- 
defined (here, as in Step 1, the system returns the top three occurring word strings). And, 
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the length of the word strings directly to the right of the analyzed word ("detained") is 
user-defined, in this example it is two-word word strings (note: any length word string or 
range of lengths may be used in Step 1 and Step 3). The result of this analysis - the list 
of word strings of a user-defined length to the right of the presented word - is called the 
"Right Signature List." Assume that the system in the above example returns the 
following Right Signature List: 

1. " for questioning" 

2. " on charges" 

3. " during the" 

In Step 4, the system operates on the returned Right Signature List. The system 
locates words and/or word strings that most frequently occur before the above three 
returned two-word word strings - i.e., those words and/or word strings to the left of the 
returned two-word word strings. The length of the word strings that the system returns in 
this operation can be user-defined or can be unrestricted. The results of this analysis - 
each list of words and/or word strings to the left of each Right Signature List entry - is 
called a "Right Anchor List." Assume that the system in the above example returns the 
following Right Anchor Lists: 

Right Signature List Right Anchor List 

1 . "_ for questioning" a. "held" (300) 

b. "wanted" (150) 

c. "brought in" (100) 

2. " on charges" a. "held" (350) 
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b. "arrested" (200) 

c. "brought in" (150) 



3. " during the" a. "beautiful" (500) 

5 b. "happy" (400) 

c. "people" (250) 

Similar to Step 2, the frequencies of common returns in the Right Anchor Lists produced 
by different Right Signature List returns can be added. The only common returns in the 
Right Anchor Lists are: 
10 a. "held" 300 + 350 = 650 

b. "brought in" 100 + 150 = 250 
In Step 5, an ICFA is conducted and the system returns a ranking. In the present 
example, a weighted frequency is produced by multiplying the frequencies of the 
common returns of Steps 2 and 4 (i.e., returns on both a Left Anchor List and a Right 
15 Anchor List), producing a Knowledge Acquisition List as follows: 

1. "held" 650x370= 240,500 

2. "arrested" 200 x 240 = 48,000 

An alternative embodiment for ranking gives no consideration to the specific 
weighted frequency. Instead, all results produced on at least one Left Anchor List and on 
20 at least one Right Anchor List are ranked according to the total number of Anchor Lists 
on which they appear. In the above example, the rankings using this embodiment would 
be: 

Rank Knowledge Acquisition Item # of Anchor Lists 
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1 "held" 4 

2 "arrested" 2 
Although both "released" and "brought in" were each produced twice in the 

analysis, neither was produced on both a Left Anchor List and a Right Anchor List 
5 ("released" was produced twice on Left Anchor Lists and "brought in" was produced 
twice on Right Anchor Lists). Other user-defined weighting schemes combining the 
number of Anchor Lists and total frequency may be utilized. For example, one 
embodiment can rank returns based on the total number of different Anchor List 
appearances and any returns found on an equal number of different Anchor Lists can be 
1 0 sub-ranked based on total frequency. 

An alternative embodiment for ranking can call for multiplying the number of 
Left Anchor Lists the result appears on by the number of Right Anchor Lists the result 
appears on. In the above example, the rankings would be as follows: 

Rank Knowledge Acquisition Item Anchor List Product 

15 1 "held" 4 

2 "arrested" 1 

The above illustration is based on a relatively small number of documents in the 
Document Database. The Document Database typically will be larger and can include 
documents remotely accessible to the system via networks such as the Internet. In one 
20 embodiment of the invention, the user not only defines the number of results to be 

included on a Signature List, but also can stop the analysis when the designated numbers 
of results have all been found with a user-defined minimum frequency. This acts as a 
cut-off and will save processing power when using a large database. 
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Other examples of user-defined parameters for ICFA to produce a Knowledge 
Acquisition List for a query word or word string can consider frequently recurring words 
and/or word strings to the left and right sides of the query in various lengths. Thus, 
instead of having a fixed user-defined length for the word strings returned in the Left and 
5 Right Signature Lists, an embodiment might have a variable user-defined length to the 
word strings returned in these Signature Lists, with a minimum and maximum length to 
the word strings. More frequently occurring word strings of different sizes used in the 
analysis on both the left and right sides of the query provides more "contextual angles" to 
identify related words and word strings. In addition, this embodiment may include a 
1 0 minimum number of occurrences for a returned word or word string to qualify for the 
Signature List. 

In one embodiment of a variable word string analysis using this aspect of the 
present invention, the query from the previous example ("detained") can be analyzed as 
follows: 

15 In Step 1, from an available database generate a Left Signature List of a user- 

defined number (of a user-defined minimum and maximum length) of the most frequent 
word strings to the left of the query. This is the same process in Step 1 of the previous 
example except here word strings of various lengths are used rather than fixed-length 
word strings. If the user-defined parameters are (1) return the eight most frequent word 

20 strings, (2) with the word strings having a minimum length of two words and a maximum 
length of four words, and (3) with a minimum occurrence of at least 500 occurrences in 
the corpus, the results in the previous example might look (again, using a hypothetical 
corpus) as follows: 
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1 "neonle were" 


1 000 


2 "arretted and" 


950 
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700 


6. "the people were"" 


650 


7. "was arrested and" 


575 


8. " they were reportedly" 


500 



10 

In Step 2, generate the Left Anchor Lists from the results of the Left Signature 
List by locating the most common words and word strings directly to the right of the 
returns from Step 1, as in the previous example. 

In Step 3, generate a Right Signature List using the same defined parameters 
15 described in Step 1 of this example, with the following results: 



Right Signature List Frequency 

1. "for questioning" 1,750 

2. "on charges" 1,520 
20 3. "during the" 1,350 

4. "because of 1,000 

5. "due to" 750 

6. "in connection" 600 
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7. "on charges of 5 575 

8. "for questioning after" 500 

In Step 4, generate the Right Anchor Lists from the results of the Right Signature 
5 List by locating the most frequent recurring words and word strings to the left of the 

returns from Step 3 , as in the previous example. 

In Step 5, rank all results produced on at least one Left Anchor List and on at least 

one Right Anchor List according to the total number of lists on which the result appears. 

Alternatively, rankings can be determined by multiplying the total number of Left 
10 Anchor Lists a result appears on by the total number of Right Anchor Lists it appears on. 

In addition, total frequency can be used to weight the rankings. A variety of user-defined 

weighting schemes can be used as previously described. 

It should be noted that while the above example query was a word ("detained") 

the system could produce semantic equivalents for word strings of any size where the 
15 word string represents a semantically identifiable idea. For instance, if the system were 

queried with "car race", it would produce potential semantic equivalents for "car race". 

Performing the same steps described in the embodiments above, which utilize an ICFA to 

determine near semantic equivalents, the system might produce "stock car race", "auto 

race", "drag race", "NASCAR race", "Indianapolis 500", "race", among other 
20 semantically related words and word strings. The system accepts queries and produces 

associated ideas using exactly the same process, without regard to the size of the query 

word string or the returns. Knowledge Acquisition Lists will also include other related 

terms like, for example, "contest", "sporting event", "Dale Earnhardt, Jr." or "boat race". 

140 



B. Knowledge Acquisition List Generation Using RCFA 

Another embodiment of the present invention for creating Knowledge 
Acquisition Lists including semantic equivalent associations is based on the use of 
5 Related Common Frequency Analysis (RCFA) rather than the Independent Common 
Frequency Analysis (ICFA) as shown above. The same basic techniques and principles 
applied using ICFA for semantic acquisition can be applied using RCFA. The RCFA 
technique of the present invention for generating a Knowledge Acquisition List including 
semantic equivalents and other relationships involves the following steps: 

10 Step 1 : Receive a word or word string query for which semantically equivalent 

words and word strings (along with other related words and word strings) will be found, 
and search a Document Database, Recurrence Database or FAD to identify user-defined 
sized word string portions of documents containing that word or word string. In an 
example, the word string "initial public offering" is entered as a query to identify its 

15 semantic equivalents using RCFA. The system then searches and identifies portions of 
documents with the "initial public offering" word string. The user may define and limit 
the number of portions returned. 

Step 2: For each occurrence of the query word string found in Step 1, analyze the 
returned portions by recording the frequency of occurrence of (i) the words and/or word 

20 strings of user-defined size to the left of the query, in combination with (ii) the words 
and/or word strings of a user-defined size to the right of the query. This step creates a 
combined Left and Right Signature that "cradles" the query called the "Left/Right 
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Signature Cradle" or "Cradle". This step is an embodiment of RCFA where a word or 
word string query is used to generate two related word strings. 

In our example, the size of the user-defined left word string can be set at two or 
three-words, and the user-defined right word string can be set at two or three-words. 
5 With a user-defined number of Cradles to be returned (for example, one-hundred) 

occurring a user-defined minimum number of times (for example, five), the calculations 
have a cut-off point. This process could result in the following partial set of hypothetical 
returns for the query "initial public offering": 

1 . " announced a successful of common stock" 

1 0 2. "shares at an price of" 

3. "announced the of its" 

4. "it considers an of common stock" 

5. "completed an raising a" 

6. "announced its of shares" 

15 7. "announced the proposed for its common" 

8. "announced an of stock" 

9. "completed its of shares" 

10. "in representing underwriters for" 

Step 3: Search the Document Database for the most frequent words and word 
20 strings (with an option to set a user-defined maximum size) that appear between the left 
and right word strings of each of the Left/Right Signature Cradles produced in Step 2. 
Identifying these other frequently occurring words and/or word strings that appear in 
between the word strings of the Left/Right Signature Cradles produces potential semantic 
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equivalents (and other related words or word strings). A user-defined minimum number 
or percentage of Left/Right Signature Cradles can optionally be required for a return to 
qualify. This step is an embodiment of RCFA where two words and/or word strings are 
used to identify related third words and/or word strings. 
5 Step 4: The resulting words and/or word strings that appear in between the word 

strings of the Left/Right Signature Cradle (i.e., the other words and word strings that 
"fill" the various Cradles) can be ranked based on total number of different Left/Right 
Signature Cradle's filled, total frequency, or some other method or combination of 
methods. 

10 In one preferred embodiment, the returns are first ranked by total number of 

different Left/Right Signature Cradles filled. Returns with the same number of different 
Left/Right Signature Cradles filled would then be ranked by total frequency of all filled 
Left/Right Signature Cradles. Another embodiment of a ranking criterion could also give 
weight to the frequency of the Left/Right Signature Cradle that produced the return, or 

1 5 extra weight could be given based on the size of the word strings in the Left/Right 
Signature Cradle. 

In the above example, top results in Step 3 might be the words and/or word strings 
"IPO", "ipo" (the results may be case sensitive), "Initial Offering", "offering", "Public 
Offering" and "stock offering", all of which "fill" the unresolved portion (vacated by the 
20 query) of some of the Left/Right Signature Cradles. 

When using ICFA or RCFA to determine semantic equivalents, different numbers 
of varying sized word strings for Left Signatures, Right Signatures, or Left/Right 
Signature Cradles can be used together in one analysis using ICFA or RCFA as shown 
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above. The more various sized word strings used as Left Signatures, Right Signatures, 
and Left/Right Signature Cradles as part of an analysis, the more angles on the concept 
the system will have to identify the query word or word string's semantic value. 

One embodiment can call for the most frequent word strings in a range of sizes, 
5 for example, the most frequent 1000 word strings between three to five words long to the 
left and right of the query to form the Left/Right Signature Cradles. As another example 
of an embodiment, the system can define the Left/Right Signature Cradles as the most 
frequent three-word word strings to the left and right of the query, along with a user- 
defined number of most frequent four-word word strings to the left and right of the query, 

10 plus a user-defined number of the most frequent five- word word strings to the left and 
right of the query. The number of words in a word string for Left/Right Signature 
Cradles are user-defined and can include any combination of ranges of word string sizes 
leading into and out of the concept (represented by a word or word string) being 
analyzed. The resulting words and word strings produced by filling the Cradles can be 

1 5 ranked by total number of different Cradles filled, giving user-defined weights to results 
produced by the different sized Cradles or the frequency count of the Cradles filled. Any 
specific embodiment using ICFA for semantic equivalents or to identify any other 
relationship can be done using RCFA, and visa versa. 

Appendix A presents examples of association results using RCFA for a variety of 

20 queries. The first 15 examples show partial results for the queries (i.e., the top 20-25 

returns per query), while the final example (for the query "it is important to note") shows 
the top 1000 returns. The user-defined settings for these results was: (1) find the first 
1000 occurrences of the query; (2) record all Cradles of two and three-word word strings 
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to the left and two and three-word word strings to the right; (3) rank Cradles by the 
frequency with which they are found; (4) find all words and word strings that fill the 
Left/Right Signatures Cradles; (5) return results based on total number of different 
Cradles filled; (6) rank results with the same number of Cradles filled by total frequency 
5 of all Cradles filled (weight can also be given to higher frequency Cradles that are filled). 
The corpus used to produce the results is comprised of approximately 2.4 billion words. 
Note that the "Relative Score" listed in Appendix A represent a user-defined metric, as 
described above, that reflects one measure of confidence that a particular return is 
semantically related. The lower the score, the less confidence. The lowest scores, for 

10 example, scores of 1 or 2, represent returns that have the lowest confidence. With a larger 
corpus, some of these low scoring returns may be raised to a higher level of confidence if 
they appear more frequently based on the user-defined measuring criteria. 

Another embodiment of the present invention associates two or more words 
and/or word strings with third words and word strings that appear on all (and also qualify 

1 5 based on possible user-defined ranking requirements) of their Knowledge Acquisition 
Lists. This embodiment of the present invention, referred to as Common List Member 
Analysis, can be used to enhance the results of applications that benefit from semantic 
associations such as search, text mining and AI applications. For example, when two or 
more Knowledge Acquisition Lists are examined and common word and word string 

20 results are identified, the common terms can be used to enhance a search function 

operating on unstructured text. Hence, if the terms "Bonds" and "San Francisco" were 
entered as two separate keywords for a particular search query into a search engine 
known in the art, the present invention could supplement additional keywords to the 
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search by identifying words and word strings that appear with a user-defined minimum 
ranking on both original keyword's Knowledge Acquisition Lists (with user-defined 
weighting). Hence, "baseball" and "the Giants" may be added to retrieve and rank 
content relating to Barry Bonds rather than financial bonds. 

In addition, terms common to Knowledge Acquisition Lists (i.e., Lists derived 
either from the keywords themselves or from the terms contained on the Lists of the 
keywords) may be used to rank results by relevance or create categories to organize 
results (by looking at terms that form category clusters based on common appearances on 
Lists). In the above example, if text in the database included information on financial 
bond trading in San Francisco, Knowledge Acquisition Lists for "Bonds" and "San 
Francisco" might both include high ranking returns like "bond trading" and "debentures" 
that could be used by the system as additional keywords or factors to enable enhanced 
search, the ranking of returned documents, or the categorization of results. In such a 
case, categories such as "baseball" and "finance" might both have been recognized by the 
system, giving the user a choice of which category to pursue. Also, as described below, 
Knowledge Acquisition Lists can be filtered for synonyms of the query (or keyword), 
which can be used to enhance and expand a particular search's results beyond documents 
that contain the keyword(s) to include documents that contain the synonyms of the 
keyword(s) as well. 

C. Knowledge Acquisition List Sorting and Filtering 

The use of ICFA and RCFA to produce a Knowledge Acquisition List will 
include some results on the list that fit the Left/Right Signature Cradle (or appear on the 
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Left and Right Anchor Lists) but are not semantic equivalents. This is particularly true if 
the user-defined number of Signatures or Cradles in common with the query needed to 
qualify as a return is not high. For example, many words and/or word strings that have 
an opposite meaning to the query word or word string will fit many of the same 
5 Left/Right Signature Cradle as the query, as will other related but non-semantically 
equivalent words and word strings. 

For example, assume an RCFA is performed on the query "in favor of and 

Cradles such as "the court ruled the plaintiff 5 and "the senator voted 

the amendment" are produced. It can be easily seen how both the query's synonyms like 

10 "for" as well as opposites like "against" will fill these Cradles and appear on the 
Knowledge Acquisition List. 

Although these other non-semantic equivalent word strings will be useful for 
many applications, if an application requires that only semantic equivalents be included 
on the list for a query, filtering techniques of the present invention can be employed and 

15 will produce a Knowledge Acquisition List with only semantic equivalents. These 
filtering techniques described below include (1) Direct Mutual Relationships - which 
considers not only the relationship of the rank of a return on the query's ICFA or RCFA 
Knowledge Acquisition List, but also the rank of the query on each return's own CFA 
Knowledge Acquisition List; (2) Semantic Triangulation - a method and system that 

20 considers the number of Knowledge Acquisition Lists (as well as the rankings on those 
lists) that both the query and one of the returns of the query appear on. This filtering 
technique can help identify a return as a near semantic equivalent of a query, even if the 
return ranks low on the query's Knowledge Acquisition List. This is accomplished by 
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identifying the low ranking returns rank and/or frequency (based on user-defined 
settings) on a user-defined number of Knowledge Acquisition Lists generated for other 
returns of the query that all share a close semantic relationship with the query (i.e., that 
appear on a number of different lists with the query); and (3) Query + Signature Overlap 
5 - in this method, the overlap technique within a single language is employed in an 
embodiment of the present invention to identify semantic equivalents. The overlap 
technique accomplishes this in the same way it connects contiguous concepts 
(represented by word strings) in chains of logic. The returns found on Knowledge 
Acquisition Lists of (i) a query word or word string with its Left Signature and (ii) a 

10 query word or word string with its Right Signature, are tested for overlap. The 

synonymous expressions for the word or word string being analyzed can be identified as 
the overlapping words in the overlapping word strings. 

Moreover, another technique of the present invention provides further methods 
for using word string patterns to automatically sort word and word string returns from 

15 Knowledge Acquisition Lists into different lists that can be labeled by the user to 

accurately reflect their semantic character relative to the query term (e.g., an opposite of 
the query (e.g., query: "hot", return: "cold"); a member of a common class with the query 
(e.g., query: "blue", return: "purple")). 

This technique, described below, is referred to as the Signature Pattern Sorting 

20 technique of the present invention. Words and word strings can also be sorted by their 
semantic relationship to one another by utilizing the Direct Mutual Relationship and 
Semantic Triangulation techniques. As the user provides training examples to the system 
of terms embodying the relationship (e.g., "hot" and "cold" for opposites), the method 
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and system can identify patterns that characterize the relationship based on appearances 
and rankings of words and word strings on Knowledge Acquisition Lists. The present 
invention can use that generalized pattern in the future to associate words and word 
strings that share that generalized pattern as terms characterizing the identified 
5 relationship. 

1. Association Utilizing Direct Mutual Relationships and Semantic 
Triangulation 

10 The Direct Mutual Relationship technique can be used to filter the results of a 

Knowledge Acquisition List by generating a separate Knowledge Acquisition List using 
RCFA or ICFA, as described above, for each return on the query's Knowledge 
Acquisition List. By creating independent Knowledge Acquisition Lists for all returns on 
the query's list, the system can identify whether the original query ranks above a user- 

15 defined threshold on each of the Knowledge Acquisition Lists of its own returns. The 
higher the mutual ranking of the query and a return on each other's Knowledge 
Acquisition List, the more likely the return is a semantic equivalent of the query. 

The Semantic Triangulation method of the present invention also makes use of 
independently generated Knowledge Acquisitions Lists for each of the query's returns to 

20 establish which returns are near-semantic equivalents of the query. The Semantic 
Triangulation aspect of the present invention examines the independently generated 
Knowledge Acquisition Lists of the returns to identify those words and word strings that 
appear above a user-defined threshold ranking on a user-defined number of the different 
Knowledge Acquisition Lists that the query also appears on as a return. For any return 

25 on a query's Knowledge Acquisition List that is also a return on a user-defined number or 
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percentage of other Knowledge Acquisition Lists which contain the query as a return 
(based on their rankings on the shared lists as well), no matter how low ranked that return 
is on the query's List, a Knowledge Acquisition List will be generated and a Direct 
Mutual Relationship analysis can be performed to further refine the semantic relationship 
5 between the return and the query. 

As just described, the Direct Mutual Relationship and Semantic Triangulation 
methods can be used together to rank returns by semantic closeness to the query. Special 
weighting can be given to the Direct Mutual Relationship, the rank of the list member on 
the original query's list and the rank of the query on each of its return's lists. These 
10 results can be used to determine what will remain on the original query's Knowledge 
Acquisition List, based on user-defined criteria for applications that call for semantic 
equivalents only. 

For example, if "IPO" is entered into the system for semantic equivalent analysis, 
the system employing RCFA or ICFA might produce a Knowledge Acquisition List with 

15 various results such as "initial public offering", "stock sale", "initial offering", and "stock 
market", among others. Although "stock market" is a related concept to the query "IPO", 
it is not a semantic equivalent. Using the above-described filtering techniques, separate 
Knowledge Acquisition Lists will be generated for "initial public offering", "stock sale", 
"initial offering", and "stock market". 

20 After generating these lists, the Direct Mutual Relationship aspect of the present 

invention might determine that "IPO" (the original query) appears materially lower on the 
Knowledge Acquisition List generated for "stock market" than on the other return's lists, 
and the Semantic Triangulation method might determine that "stock market" consistently 
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appears lower than the query and the other returns on the independent lists generated for 
"initial public offering", "stock sale" and "initial offering". For these reasons, user- 
defined parameters might remove "stock market" from the Knowledge Acquisition List 
for "IPO" for applications like translation, voice recognition, search, and other 
applications that prefer only close semantic equivalents. 

The results of the two above analyses can be employed based on user-defined 
settings. For efficient processing, in one embodiment, only a user-defined number of top 
ranking phrases of a query's Knowledge Acquisition List are independently tested by 
generating its own CFA to perform the above analysis. If, however, a phrase appears 
with a low rank on a query's Knowledge Acquisition List (or does not even appear at all), 
but the word or phrase appears on a user-defined number of lists of the query's 
established semantic equivalents (even if it is ranked on them low as well), the phrase can 
be tested by generating an independent Knowledge Acquisition List to test for the 
"mutual" consideration (where does the query rank on the other phrase's list). 

When the user furnishes the system with a plurality of words and/or word strings 
that are synonyms and then furnishes it with a training set of pairs of words and/or word 
strings that are related but not synonymous, the pattern of Knowledge Acquisition List 
appearances and rankings that is unique to the synonyms or non-synonyms can be used to 
identify words and word strings in the future that are synonyms of one another. 

Similarly, the system can also use the examples of terms furnished by the user 
that are non-synonymous that have a specific relationship to one another (e.g., opposites, 
class members) as training examples, and look to identify any general pattern for this 
relationship between the terms on each other's Knowledge Acquisition Lists as well as 
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look for patterns of these terms relative to one another on other Knowledge Acquisition 
Lists. The system can then use these patterns to identify the general relationship between 
two terms that share those patterns. 

Both the Direct Mutual Relationship and the Semantic Triangulation techniques 
5 can be used to identify patterns based on appearances and rankings on Knowledge 

Acquisition Lists that identify other semantic relationships. For instance, after the user 
furnishes the system with training examples of words and word strings that are members 
of a common class of one another (e.g., "New York" and "Los Angeles" are U.S. cities), 
the system may identify a pattern of Knowledge Acquisition List appearances and 

10 rankings that can be generalized and used to identify other words and word strings that 
represent U.S. cities. 

Additionally, a Knowledge Acquisition List appearance and ranking pattern 
common to different groups of class members can further identify a more general pattern 
that will indicate that two words and/or word strings represent common class members. 

15 For example, if the system analyzes Knowledge Acquisition Lists using training words 
and word strings furnished by the user representing U.S. cities, colors, names, and 
numbers, and finds a pattern of list appearances and rankings that characterize the general 
relationship of class members, the system can use the pattern in the future to generally 
identify the relationship between two terms as class members. 

20 

2. Association Utilizing Queries and Signature Overlaps 

This method employs the requirement of an overlap of words as a filtering 
technique to leave only semantic equivalents on a Knowledge Acquisition List. This 
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method can either refine an existing Knowledge Acquisition List or be used to create an 
independent list of only semantic equivalents of a query. This method takes a query word 
or word string and identifies a user-defined number of Cradles (or independent Left 
Signatures and Right Signatures) of a user-defined sized range of word strings. Next, the 
5 query plus a user-defined number of Left Signatures, each taken together as a longer unit 
word string (Query + Left Signature), are analyzed using RCFA (or ICFA) to produce 
Knowledge Acquisition Lists for the Query + Left Signature word strings. Next the 
query plus a user-defined number of Right Signatures are each taken as a unit to produce 
a number of Knowledge Acquisition Lists for the chosen Query + Right Signature word 

10 strings. Next, a user-defined number of top ranked members of the Knowledge 

Acquisition Lists for the Query + Left Signature word strings are tested for overlapping 
words and word strings between the right side of each of them and the left side of a user- 
defined number of members of the Query + Right Signature Knowledge Acquisition 
Lists. The overlapping word or words in each overlapping word string identified in this 

15 last step are typically semantic equivalents of the query. 

For example, in the earlier example using the query "initial public offering", the 
identified Left Signature lists are added to the query and a Knowledge Acquisition List is 
generated for each of these larger strings. Therefore, an analysis of a Left Signature + 
Query such as "for an initial public offering" will be used as a query itself to generate 

20 semantic equivalents, as will other Left Signatures + Query such as "announced the initial 
public offering" and "the proposed initial public offering". 
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Next, Right Signature + Query word strings like "initial public offering price of 
and "initial public offering of stock" are used as queries to generate Knowledge 
Acquisition Lists (and potential synonyms) for these phrases. 

Next the members of the Left Signature + Query lists are tested on their right 
5 sides for overlap with the left side of the user-defined qualifying members of the Right 
Signature + Query lists. The words and word strings that overlap are semantic equivalent 
words and word strings of the original query (e.g., initial public offering). One example 
of such a result is if the Left Signature + Query word string, "announced the initial public 
offering" generated a list that included "went public with the IPO", and the Right 
10 Signature + Query word string "initial public offering of stock" had a qualifying list 
member of "IPO of equity", then the "IPO" is the overlapping word or word string and, 
therefore, is presumed to be the synonym of the term "initial public offering". 

The Query + Signature Overlap filtering technique can be combined with the 
other filtering methods. In one embodiment, Mutual Direct Relationship and/or Semantic 
15 Triangulation can be employed as a first step before employing the Query + Signature 
Overlap filtering method. 

3. Association Utilizing Word Synonym Flooding 

In addition to the method and system of the present invention just described to 
20 identify semantically similar words and word strings, the present invention can also 

incorporate a single-state or language Flooding method to further help identify semantic 
equivalent word strings of a query word string or to modify the results of a CFA. This 
embodiment uses a word-for-word or word-for-phrase thesaurus to identify synonyms of 
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words. In addition to individual words, the thesaurus can be populated with idioms and 
co-locations associated with their semantic equivalents. 

A query word string is broken down into individual words (and/or idioms and co- 
locations) and a list of semantic equivalents for each word (and/or each idiom and co- 
5 location) would be identified using the thesaurus (and/or word-for-word (or word-for- 
phrase) semantic equivalents using CFA). A corpus of text is then searched for word 
strings with a minimum number of synonyms for each of the query word string words 
(counting only one synonym for each word toward the minimum) in a user-defined 
maximum sized word string. An original word from the query word string can be used 

10 instead of one of its synonyms to satisfy the search criteria. This method is conceptually 
similar to the Target Language Flooding method of the present invention for building 
word string translations between two languages, except in this embodiment a thesaurus is 
used instead of a cross-language dictionary. If, for example, a technical dictionary is used 
that defines technical jargon in terms of common words, then the method produces 

15 translation among two variant forms of the language (e.g., technical and lay). For 
instance, if the thesaurus included an entry for "non-metastasized" equating to 
"localized" and an entry for "oncological mass" equating to "cancer," the phrase "non- 
metastasized oncological mass" would equate with the phrases "localized oncological 
mass," "non-matastasized cancer," and "localized cancer," among possibly others based 

20 on user-defined search parameters and text being used to perform the Flooding. 

4. Word String Cradle or Signature Pattern Sorting 
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The present invention can also be trained to recognize the patterns of Signature 
and Cradle word strings to the left and right of any word or word string that identify 
relationships between a Knowledge Acquisition List result and a query (e.g., opposites, 
class members, a concept and an example, other related knowledge). The user can give 
the system a group of examples that characterize the relationship and the system learns 
the word string Signature and/or Cradle patterns that provide the relationship character. 

For example, to train the system to recognize opposites, the user might supply the 
following three queries with three members from each query's original Knowledge 
Acquisition List that were opposites of the query, as follows: 

Query Opposites 

1 . "good" "bad", "very bad", "awful" 

2. "world class scholar" "stupid", "dumb", "moron" 

3 . "cold" "hot", "very hot", "boiling" 

The user can also give additional examples of synonyms of the query and its 
opposites for further training. The system will then look for the Left and/or Right 
Signatures (or Cradles) that are unique to the opposites of the query. 

This embodiment of the present invention, like the generation of Knowledge 
Acquisition Lists, uses CFA to establish both the common Left Signatures and common 
Right Signatures (or common Cradles, as the case may be) between two different groups 
of words and/or word strings. Importantly, this embodiment may also examines the Left 
Signature word strings of a query and compares them with the Right Signature word 
strings of a term entered by the user and identified as an opposite of the query, seeking to 
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identify exact matches between them. This embodiment also examines the Right 
Signature word strings of a query and compares them with the Left Signature word 
strings of the opposite terms entered by the user seeking to identify exact matches 
between them. Often, these patterns between terms of identical ideas occurring on 
5 opposite sides (or contexts) of the query and its opposites will be indicative of a 
particular relationship. When the user provides the system with examples that 
characterize the relationship between them, the system can examine and identify which 
Left Signatures of one of the examples of the query or its synonyms is exactly the same 
as the Right Signature of examples of the words and word strings representing the 

10 opposite idea of the query, and visa versa. Finding the word strings that are the Right 
Signature of a query and the Left Signature of the query's opposite, or identifying word 
strings that are Left Signatures of the query and are also Right Signatures of the query's 
opposites can help provide the basis to identify those word string patterns that 
characterize that relationship. When the system identifies terms on a CFA Knowledge 

15 Acquisition List of related knowledge it has not encountered before but has this "opposite 
Signature" relative to the query, the system can identify the relationship of the return to 
the query as opposites. 

These Signature and Cradle patterns that are unique to opposites can form the 
pattern that allows the system to be trained to identify opposites in the future. Different 

20 opposites will identify patterns that will generalize to certain other opposites the system 
has yet to encounter. New opposite relationships the system encounters performing 
RCFA or ICFA for related knowledge (including semantic equivalents) may not be 
captured by the training conducted with previous opposite Cradles or Signatures. When 
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such a case occurs, and the user identifies to the system a result on a Knowledge 
Acquisition List that is a semantic opposite of the query word string, the system can use 
the query word string and the semantic opposite word string return for further training to 
identify the relationship of Signatures (or Cradles) to this type of opposite. 
5 The same type of training technique described for opposites can be used to train 

the system to recognize other relationships. The system uses examples to find Signature 
(or Cradle) word string context patterns that are unique to the relationship and therefore 
define it. For example, the system can be trained to recognize class members of a query 
or examples of a query by providing the system with the different word string examples 

10 that characterize the semantic relationships. The system will then identify the pattern of 
Cradles (or Signatures) that are unique to each group of words and/or word strings which 
can be used to identify such relationships in the future. 

The method and system identifies identical matches of the Right Signature of 
query to the Left Signature of a return, and Left Signature of query to Right Signature of 

15 a return to establish Signature word string patterns to identify the relationship, as well as 
identifies Cradles that are exclusive only to the opposites but not to true semantic 
equivalents (or other relationships). This process compares Left Signatures to Left 
Signatures and Right Signatures to Right Signatures using standard CFA techniques 
except instead of looking for only common Cradles to the query, the system looks for 

20 Cradles shared by the query's opposites but not by the query. By identifying Cradles 
unique to a query's opposite, this word string pattern can be used to help identify terms 
that are opposite to other terms. 
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For example, a unique pattern of the query's Signatures or Cradles that is not 
shared by the opposite of the query will often include Signatures or Cradles that contain 
the query's opposites as part of the Cradle or Signature word string, as illustrated below. 
For instance, three hypothetical Cradles for "hot" found in a corpus of documents might 
5 be: 

"it's not it's cold" 

"I'm not I'm cold" 

"you promised it would be but it's cold" 

The opposite term "cold" is part of the word strings that make up the unique 
10 Signatures to the query word "hot" that the word "cold" will not share. This along with 
other word string Signatures or Cradles that are unique to "hot" and not to "cold" will 
identify "cold" as an opposite of "hot" even though "cold" may rank high on the 
Knowledge Acquisition List using CFA for the term "hot" before this embodiment or 
other embodiments of the present invention for Knowledge Acquisition List filtering and 
15 sorting are used. 

The results show a pattern, formed by the Signatures (or Cradles), that identifies a 
unique type of relationship. The system can then use this pattern to identify other word 
and/or word string pairs that also share the "relationship identifying" pattern formed by 
the comparison of their Signatures (or Cradles). Thus, in an embodiment of the 
20 invention, the system is queried with a word or word string to identify words and/or word 
strings with the opposite meaning, the system will (1) identify the most frequent words 
and/or word strings surrounding that query, (2) identify the list of words and/or word 
strings that have some Signatures (or Cradles) in common with the query, but not of the 
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type or with the number or percentage of commonality that would identify them as a 
synonym, (3) then compare the Signatures (or Cradles) these related (but not 
synonymous) words and/or word strings share with the query (both left to right and right 
to left, and left to left and right to right, as described above) and (4) compare the results 
5 from Step 3 with the Signatures of previously identified opposite word and/or word string 
pairs. If any of the comparisons generated in Step 3 have a pattern that is similar enough 
(user-defined) to the pattern formed by Signature comparisons between known opposites 
(based on the Signatures or Cradles identified in Step 3 that are indicators of an opposite), 
the system will identify the word or word string from Step 2 that contrasted with the 

10 query to form that pattern and identify it as the opposite of the query. 

These same principles apply for the system to identify any relationship between a 
Knowledge Acquisition List return and a query including not only synonyms and 
opposites, but also members of a common class (e.g., "red" and "blue" are colors; "New 
York" and "Paris" are places) and any other semantic relationship. By locating the 

15 common Left to Left and Right to Right Signatures as well as common Left to Right and 
Right to Left Signatures between two words and/or word strings, patterns will emerge 
that characterize these relationships for automatic identification of the relationship by the 
system for future pairs of terms that share that relationship defined by those related 
Signatures. The system can also automatically "cluster" groups of words and/or word 

20 strings by their common Signatures and Cradles that are unique to that group as well as 
identify their relationships to other groups. 

It should also be noted that the user-defined parameters for the system to produce 
word string equivalents (or any other relationship) can include word strings in close 
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proximity to the query and not just directly adjacent to the query on the left or right side. 
Adjusting the user-defined parameters may be particularly desirable in applications where 
expression of semantic meaning is typically less efficient or less structurally conventional 
(e.g., conversations fixed in an Internet "chat room" medium and other types of 
conversations). 

VI. SINGLE-STATE KNOWLEDGE LISTS FOR USE IN CROSS-STATE 
KNOWLEDGE ACQUISITION AND RECONSTRUCTION 
(TRANSLATION) 

Additional embodiments of the present invention utilize the system and method 
for generating a list of semantic equivalents to aid in the present invention's use for the 
translation of languages. It can be used to perform translation as an alternative to, or in 
conjunction with, any of the methods of the present invention that identify word string 
translations to be added to the cross-language database. 

The methods and systems of the present invention can be used to produce 
semantic equivalents to be used as an aid to any corpus-based machine translation system 
(e.g., EBMT), including the machine translation aspect of the present invention. Any 
number of embodiments using semantic equivalents of word strings in the Source 
Language and in the Target Language can be used to produce, test and verify accurate 
translation. Moreover, other embodiments can use translations of Signatures or Cradles 
to help complete accurate translation. 

For instance, if a word string translation is needed to complete a translation and it 
cannot be found in the cross-language association database and cannot be built using 
available Parallel Text, the system can generate semantic equivalents for the unknown 
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translation in the Source Language and see if any of the semantically equivalent word 
strings have known translations in the Target Language in the database, or can be learned 
based on available cross-language text. 

Alternatively, a word string translation in the Target Language may be in the 
5 cross-language association database, but it may not overlap with the contiguous word 
string translations on both sides as required by the dual-anchor overlap technique. In 
such a case, the translation would not be approved by the dual-anchor overlap 
requirement, but the Target Language word string translation can be used to produce 
semantically equivalent word strings in the Target Language which can then be tested for 

10 overlap with its neighbors to be approved as a complete translation. 

Another example of how the system and method for generating a list of semantic 
equivalents can be utilized in a translation database is as follows: 

First, generate two specific Signatures of a user-defined size to the left and right 
of the portion of the Source document that is yet to be resolved. For example, assume 

15 that the system is translating the sentence "I went to the ball park to watch the baseball 
game". Moreover, assume that cross-language overlapping translations for "I went to 
the", "went to the ball park", "to watch the", and "watch the baseball game" are known to 
the system. The system does not have an overlapping Target Language word string 
translation for a phrase that overlaps with "went to the ball park" and "to watch the", for 

20 example, "ball park to watch" (this is known as an unresolved phrase or portion) which is 
needed to provide the overlapping connection to approve the translated sentence with 
contiguous overlapping word strings in both languages. If the user-defined parameters 
are defined as the three-word word string immediately to the left of the unresolved 
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phrase, and the three-word word string immediately to the right of the unresolved phrase, 
the present invention returns two three-word word strings: a "Specific Left Signature 
Word String" ("went to the") and a "Specific Right Signature Word String" ("the 
baseball game"). 

Second, using any of the previously described embodiments for creating semantic 
equivalent associations, generate Signature Lists (using in this example ICFA) for the 
unresolved phrase from a Document Database in the Source Language. The lists created 
using the above-described semantic equivalent system and method on the unresolved 
phrase are called the Left Signature List and the Right Signature List. 

Third, translate both the Specific Left Signature Word String and all the entries on 
the Left Signature List to the Target Language. The translations can be obtained using 
any method of the present invention or any device known in the art. Results using 
translation systems known in the art can be improved by using the present invention's 
multilingual leverage embodiment, previously described. The result of this process is the 
"Left Target Signature List." Conduct a similar translation process on the Specific Right 
Signature Word String and all the entries on the Right Signature List to create a "Right 
Target Signature List." 

Fourth, using Steps 2 and 4 above of the semantic equivalent process, generate 
Target Language Anchor Lists from the Left and Right Target Signature Lists using a 
Target Language Document Database. The resulting lists from this process are, 
respectively, the Left Target Anchor Lists and the Right Target Anchor Lists. 

Finally, compare the returns of the Left Target Anchor Lists with the returns of 
the Right Target Anchor Lists. The results that appear on at least one of the Left Target 
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Anchor Lists and one of the Right Target Anchor Lists are potential translations of the 
query and are ranked according to the total number of Anchor Lists on which they 
appear. Extra weighting for the ranking can be given for appearances on the Anchor 
Lists derived from the Specific Context Word Strings for greater precision. Rankings can 
5 also be determined by multiplying the number of Left Anchor Lists by the number of 
Right Anchor Lists that a result appears on. Additionally, some weight for the total 
frequency of returns and/or any other user-defined criteria can be included as a factor in 
ranking results. 

Of course, like any application using ICFA, the above embodiment can be 
10 similarly accomplished using RCFA with Specific Context Cradles for the query and 
other high frequency general Cradles as described above. In such a case, Specific 
Cradles to the exact context as well as General Cradles are generated in the Source 
Language, and then translated to Target Language Cradles. Then, the Target Language 
Cradles are used on a Target Language corpus to fill the Cradles with other Target 
1 5 Language word strings. 

Another embodiment using semantic equivalents to build a database of potential 
translations for a query, given an unresolved phrase, is as follows: 

First, using only Specific Left and Right Signature Word Strings of the 
unresolved phrase of the query, generate Anchor Lists, as described above. Then, using 
20 Left and Right Signature Lists (without the Specific Left and Right Signature Word 
Strings), generate the Left and Right Anchor Lists, as described above. The results that 
appear on (a) at least one of the Left Anchor Lists and/or the Anchor List derived from 
the Specific Left Signature Word String and (b) at least one of the Right Anchor Lists 
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and/or the Anchor List derived from the Specific Right Signature Word String are then 
ranked according to the total number of Anchor Lists on which they appear. Extra 
weighting for the ranking can be given for appearances on the Anchor Lists derived from 
the Specific Context Word Strings. Alternatively, multiplication of the number of Right 
5 Anchor and Left Anchor Lists a return appears on can be used for ranking or any other 
user-defined method. 

Next, the unresolved portion of the translation query and its list of semantic 
equivalents generated by the ranking described above are then translated into the Target 
Language. The translations can be obtained using either the present invention's Parallel 

10 Text database builder (using available Parallel Text), any of the other methods of the 

present invention for building word string translations, or other translation devices known 
in the art. Results using translation systems known in the art can be improved by using 
the present invention's multilingual leverage embodiment previously described. If a 
user-defined number of translation results are identical, the result can be designated as a 

15 potential translation. To further the analysis, in another embodiment, for each of the 

translation results, the system generates a list of semantic equivalents using a database of 
text in the Target Language. The original Target Language translations that appear on 
the largest number of the lists (but at least two of the lists) with a threshold minimum 
ranking on those lists (absolute and/or relative) are designated as potential translations of 

20 the unresolved portion of the query. 

All embodiments using semantic equivalent analysis to aid in the translation of 
unresolved word string translations can also produce additional Signatures or Cradles by 
using the Specific Context Word Strings and performing CFAs to produce semantic 
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equivalents of the Specific Left Signature Word String (or Cradle) and semantic 
equivalents of the Specific Right Signature Word String (or Cradle). These semantic 
equivalents of the specific Signatures or Cradles can be used as additional Signatures or 
Cradles to build semantic equivalents in the Source Language, or be translated directly to 
5 the Target Language to build Target Language semantic equivalents using translated 
Signatures or Cradles. 

As another embodiment to translate documents from one language to another 
using ICFA or RCFA, sentences and other segments of documents to be translated are 
parsed word-for-word and a Knowledge Acquisition List is generated for every word to 

10 be translated as well as corresponding Left and Right Signature word strings. Using the 
words in the Source Language, and a cross-language dictionary between both languages, 
possible translations for each word can be assembled in the Target Language. These 
Target Language words are used to generate Knowledge Acquisition Lists for each word 
in the Target Language. A derivation of the dual-anchor overlap technique looks for 

15 overlapping word strings found in each Knowledge Acquisition List of neighboring or 
close proximity words in the Source Language and the same is done in the Target 
Language. Using the cross-language dictionary, the words in the overlapped word strings 
on Knowledge Acquisition Lists in the two languages are tested against each other to see 
if they are translations for one another. If a user-defined threshold of words translate 

20 accurately in the overlapped word strings on the Knowledge Acquisition Lists, those 
strings can be approved as translation. Word string translations can be further verified 
using the dual-anchor overlap technique to connect the translation to contiguous word 
strings. The same technique can be used with parsed units larger than one word (e.g., two 
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words) and the present invention for translation or an existing translation engine known 
in the art would act as a translation bridge between languages instead of a cross-language 
dictionary. 

Additionally, the techniques of the present invention that identify a specific 
5 quality of semantic relationship that a word or word string has to other words or word 
strings can be used in translation applications by utilizing a method of the present 
invention that allows interchangeable semantic terms to be tokenized when searching for 
Source Language word strings and/or Target Language word strings to identify 
translations. For example, assume you are trying to translate a word string in Language 

10 X that means "tell Bob to come downstairs" into English using one of the methods of the 
present invention. If the Language X and/or English text does not have that word string, 
but has the word strings "tell Jim to come downstairs" and "tell Mary to come 
downstairs", it is desirable to use these word strings to help identify the translation by 
using a "name token" instead of the word "Bob" and then substituting the translation for 

1 5 "Bob" for the name token in the final output translation. 

It is known in the art to use class tokens in translation for known equivalence 
classes like names, dates, numbers, and days, which are usually interchangeable with one 
another in a translation, so one translation of the form will serve as a translation for all 
class members. These techniques known in the art look to populate the equivalence class 

20 ahead of time with known members so they can be identified when they're encountered. 
While this method works well for known class members that fit only one class, if a word 
fits two or more classes, or a word or word string that is a certain class member that is 
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unfamiliar (e.g., name) is encountered by the system, the state of the art cannot use the 
class token when searching Target text for translation candidates. 

The present invention provides a method for using class tokens for words and 
word strings that are not known class members to the system. This method analyzes any 
word string that is not represented in the cross-language database or corpus and looks to 
see if any of the words or sub-strings within the larger unknown word string (or an 
extension of it created by adding the contiguous words before and/or after the unknown 
word string) is a Signature (or Cradle) that identifies a word or word string in the larger 
unknown string as a member of a class that can be tokenized. 

For example, if the word string to be translated means "tell Jerome to come 
downstairs" and the system does not have this word string translation in the database and 
cannot find it in the available documents, the system may identify that the Cradle "tell 

to come downstairs" is a possible "name class" indicator and that the word 

"Jerome" appears in enough other word strings in the corpus to meet a user-defined 
number or percentage of name Cradles to be classified as a name token. The system can 
use this information to use the word strings from the corpus that have the Cradle "tell 

to come downstairs" with any other name filling the Cradle to build the translation 

for "tell Jerome to come downstairs" once the name Jerome is tokenized. 

Moreover, any time a word or word string has two meanings and only one 
meaning is part of a certain class, the specific Cradle (or independent Left and Right 
Signatures) will determine which meaning is used. For instance, if the sentence is "give 
me the blue paint before you go", the system can tokenize "blue" as a color based on the 
Cradle "give me the paint" and other known Signatures for "blue" that establish it 
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as a color. If, however, the word string is "I feel blue since the breakup", the system will 
not tokenize "blue" as a color because the Cradle does not fit the color class but can 
replace it with a word like "sad" that is a member of the "emotions" class along with 
"blue" based on the above methods. 

5 

VIL SINGLE-STATE KNOWLEDGE RECONSTRUCTION 

Just as the dual-anchor overlap technique pieces together appropriate neighboring 
word string translations across languages, the same overlap technique can be used to 
restate any longer idea in a number of different ways in a single language by parsing the 

10 longer idea into overlapping sub-units, generating semantic equivalents for the sub-units, 
and substituting synonymous sub-units for original text when a synonymous sub-unit 
overlaps with its neighbors (neighbors can be original text or synonyms of original text). 
This is an effective application for text mining and search and retrieval as well as voice 
recognition, natural language interfaces and more complex artificial intelligence 

15 applications. 

For example, take the statement "when I get home from school I must do my 
homework before I go out to play with my friends". The semantically equivalent phrases 
for the following parsed sub-units may be known to the system by conducting RCFA or 
ICFA knowledge acquisition analysis along with semantic equivalent filtering 
20 techniques: 

1 . "when I get home from school I must" 

a. "when I come home from school I must" 
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b. "when I arrive home from school I better" 

c. "as soon as I come home from school I have to" 

2. "I must do my homework before I go out" 

a. "I have to do my homework before I go out" 

b. "I better do my schoolwork before I head out" 

c. "I must get my homework done before I leave the house" 

3. "go out to play with my friends" 

a. "head out to play with my friends" 

b. "leave the house to hang out with my posse" 

c. "go out to hang with my buddies" 

The above semantically equivalent lists of word strings, plus the overlap technique, can 
provide a variety of alternative ways of expressing the entire original statement. For 
example, an alternative statement might be: 

when I arrive home from school I better 

I better do my schoolwork before I head out 

head out to play with my friends 
After eliminating the redundancy, the system presents "when I arrive home from school I 

better do my schoolwork before I head out to play with my friends" as a synonymous 

expression to the original query. 

VIII. SCOPE OF CFA APPLICATIONS 
A. In General 
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At its core, the association database building technique involves (i) taking a unit 
of data organized in a linear or ordered fashion, (ii) breaking the data down to all possible 
contiguous subsets of the whole, and (iii) building relationships between all subsets of 
data, based on the frequency of recurring subsets' (generally close) proximity to one 
5 another in all units of data available for study. At the core of CFA, the system identifies 
frequently recurring proximity relationships between groups of recurring data segments 
to illuminate certain associations shared by two or more recurring data segments. 
Therefore, the same techniques used in the database creation and Common Frequency 
Analysis can be employed to recognize patterns for many other types of data mining, text 

10 mining, target recognition, and any other application that requires the recognition of 

patterns between associated ideas. Moreover, these tasks are not limited to finding word 
string patterns in text. 

For language translation, the embodiments of ideas are represented in documents; 
for music, the embodiments might be digital representations of a music score and sound 

15 frequencies denoting the same composition, and the like. Using the two mediums of 
video and audio, an association between a video clip of a baseball player swinging and 
missing to strike out, and the word string "strike out" might be associated using similar 
techniques. The consistent general visual representation of a baseball player swinging 
and missing and then going back to the dugout, and the word string "strike out" (or a 

20 sound frequency that is known to mean "strike out"), over a significant sample size, will 
have a very high cross-idea frequency. The mechanism to generalize the understanding 
of swinging and missing when encoded as visual data once developed will allow the 
system to operate in this situation. 
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As another example, a common goal of visualization software involves the 
analysis of visual images by a system to determine automatically whether or not a person 
is in an image. While it is a very difficult task for current state of the art visualization or 
image recognition technologies, the present invention can use CFA to learn the Signature 
5 of "people" by finding proximate features (e.g., within a given radius) in the section of the 
image that corresponds to a person. This embodiment calls for providing the system with 
a corpus of images on which to train to find the distinguishing factors between pixel arrays 
that make up people versus pixel arrays that make up things other than people. One 
method has the system use pictures taken with both light sensitive lenses and infrared 

10 sensors that will identify objects emitting heat. The system will then train to recognize the 
pattern of light sensitive pixels that define the relationship between objects emitting heat 
and those that don't. Of this heat-emitting group, the system can then further refine the 
training of pixel pattern to distinguish between the heat emitting non-human elements 
(other animals, fire, etc.) and people. 

15 As a general matter, the present invention defines any given "subject idea" based 

on the sequence of ideas that appear around that subject idea in all its contexts. In a sense, 
the invention defines each subject idea by the universe of ideas surrounding it, including 
the ideas found leading up to the subject idea and the ideas found following the subject 
idea, regardless of the forms in which the ideas are expressed. When an idea is expressed 

20 in written language, there is the dimension of "time" (as expressed by flow, order, or 

sequence) to surround and define it. The Left Signatures in the English language represent 
the different ideas occurring just prior in "time" to any query and the Right Signatures in 
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the English language represent the different ideas that are found following a query idea in 
"time." 

Representations of ideas in certain mediums other than text add additional 
dimension to the "space" surrounding a subject idea. These additional dimensions supply 
5 other defining contexts for a subject idea, in addition to the context that multiple units of 
time provide for an idea. For example, spoken language adds context (signatures) in the 
form of tone, intonation, and cadence, among others, for each idea in a sequence of ideas 
(in addition to the still very important identification of ideas just before and after the 
subject idea). Visual representations of an idea add the surrounding physical (or 

10 perceived) dimension to provide additional context to an idea that is not moving through 
time, as well as from the sequence of ideas that come before and after it, if it is moving 
through time. Of course, audio-visual representations of ideas, and other simultaneous 
multi-sense representations add a number of dimensions of surrounding contexts that help 
define each isolated idea in time, in addition to the important context provided by the 

15 sequence of surrounding ideas over multiple units of time. 

B. Data Compression 

Once knowledgebases of ideas are generated within a single state using CFA (or 
across states using cross-state knowledge acquisition), the different words and word 
20 strings that articulate the same idea within each language and across different languages 
can be commonly identified by assigning each idea a number or some other unique 
efficient identifying label or token. This naturally provides a very powerful data 
compression method and system. If expressions in existing states are assigned specific 
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associations with data points in another state and catalogued in a database, conversions 
between those two states will be possible. 

For example, each "idea" represented in a form, state, or language can be 
assigned a number (or a frequency on the electromagnetic spectrum). When a 
combination of ideas are to be transferred from one location to another, they can be 
parsed into overlapping ideas, and those representations of parsed ideas can be converted 
to their assigned token (e.g., number, electromagnetic frequency, etc). By using these 
tokens the amount of data needed to be transferred from one location to another using the 
electromagnetic spectrum or other forms of bandwidth (along with sending encoder 
machines and receiving decoder machines) is compressed. 

Transmission of an idea will require transmitting the pair (idea, unique number) 
the first time, and just the number all subsequent times. For multi-processor realizations 
of the technology in this invention, the same efficient internal transmission between 
processors may be implemented as transmission of ideas at a distance (e.g., by unique 
number). The ideas once transmitted are decoded by substituting their unique identifier 
with the idea description - regardless of how the unique identifier is encoded: a number, 
an electromagnetic frequency, or any other identifier. 

IX. SINGLE-STATE CFA FOR SMART APPLICATIONS 

The present invention, in another embodiment, can be instructed by the user to 
automatically carry out certain CFAs based on the identification of certain combinations 
of patterns of two or more different word strings that occur together in a question, request 
or statement. The user would instruct the system that the presence of the pattern of two 
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or more different word strings (after various alternative parsings into two or more word 
strings of various sizes identify known word string combinations in certain proximities or 
order) are part of a complex category bin that triggers certain CFAs. These CFAs may 
require the system to access previously learned information from previous CFAs now 
5 stored in a knowledgebase, or may require the learning of new information from a 
Document Database (or the web or other available corpus) to be used and stored in the 
knowledgebase for future use. With each result of a CFA, the system will retrieve 
information from the knowledgebase or, based on previous training and triggers set by 
the user (or triggers that are self-learned by the system), carry out the next CFA (or a 

10 series of CFAs that are triggered by the previous CFA) until the system has given an 
answer to a question or performed a task. 

The invention can use the methods of the present invention to generate 
Knowledge Acquisition Lists and use the filtering techniques to identify semantically 
equivalent words and word strings for all parsed words and word strings in a request, 

1 5 question, or statement. In one embodiment, the method and system can be trained to 

recognize different types of questions. For example, if the system were asked a question 
such as "Where can I see kangaroos in America?", the system may have been trained to 

recognize what might be categorized by the user as the "Where Does One Find " 

category bin, previously trained and labeled by the user. The user can train the system to 

20 recognize various alternative forms of the question using the semantic equivalent 

generator (and the overlap technique) described above on one or more examples of this 
type of question. Once the system has been trained and can recognize the various specific 
examples of such questions, triggers can by set by the user when this type of question is 

175 



identified that will initiate the prescribed next CFAs to be performed to provide an 
answer to the question. 

For example, the system will learn via semantic equivalent analysis and filtering 

that "where can I go to see ", "where can you tell me to go to see and 

5 "where can I find " are all members of the "Where Does One Find " 

question category bin. 

Likewise, the system will also assemble category or idea bins using semantic 
equivalent generation through RCFA or ICFA for "see kangaroos" (e.g., "watch 
kangaroos") and "in America" (e.g., "in the US"). The system can therefore recognize 

10 the presence of combinations of members of different classes that trigger the next set of 
words and/or word strings to be used to conduct a CFA. The user can therefore train the 
system to recognize these patterns of bin members in certain sequences so that they 
trigger the strategy of CFAs needed to identify the answer to this type of "Where Does 
One Find " question. 

15 Moreover, the "Where Does One Find" part may not be in the beginning of the 
sentence, for example "If I want to see kangaroos while I'm in America, where do you 
suggest I go." The "where do you suggest I go" is the last idea in this sequence. The 
user will therefore train the system to recognize this form and sequence of concepts as 
members of the "Where Does One Find " question category bin for CFA analysis to 

20 perform artificial intelligence applications. 

In one embodiment, the user can set a trigger for the system so that when it is 
confronted by a sequence of ideas from category bins that pose a "Where Does One Find 
" question, the system would provide an answer that fits the idea category bin of 
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"Places" for it to be a good answer. To figure out the correct place will be the goal of the 
CFAs that will be triggered by the recognition of the group of word strings in the "Where 

Does One Find " question. 

The user may train the system, when confronted by a "Where Does One Find 

5 " type of question to look for a member of the "Place" category bin that is most 

associated with (i.e., frequently directly next to (or near) the left or right of) the object the 
query requests to see, in this example, "kangaroos." What "places" are most associated 
with the "object" might merely entail frequency counts directly next to or near the left or 
right of the object in text, or may involve training the system to recognize specific word 

10 string Signatures or Cradles around the object that indicate you can find the object in a 
place. If this were the only information in the question, the highest related member of 
the "Places" bin to "kangaroos" might be "Australia." In the example, however, the 
question also contains a member of what a user might train the system to recognize as a 
"Place Restriction" category bin, "in America." The user can train the present invention 

15 to trigger a CFA between the thing that the questioner wants to see ("kangaroos") and the 
Place Restriction ("in America"). The highest associations between these two data 
segments might be "the zoo", "the San Diego Zoo" or "on TV." Note that "on TV" may 
not fit the conventional "Place" category bin. However, the query "where can I see" 
could fit into the "How Can One View " category bin, (as well as the "Where 

20 Does One Find " bin). This would include "on TV," and therefore the smart 

application would allow answers from the "Place" bin as well as, for example, the "Ways 
to View Things" bin established by the user or learned by the system. 
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Other more complicated questions may require the results of a CFA to trigger 
another CFA as part of a multi-step trigger scheme to address certain types of questions 
or requests. As above, the user can train the system to employ these trigger steps based 
on patterns of different word strings fitting general categories and the "thought process" 
or strategy the user has trained the system to employ. 

The system is trained by the user to employ certain triggers for certain CFAs as 
just described. As the user trains the system and a critical mass of triggers to solve 
problems is reached, the system will begin to learn how to recognize how to trigger 
appropriate next step CFAs when confronted with a new pattern of word strings based on 
the similarity between the unfamiliar multiple word string patterns (using CFA semantic 
equivalent analysis plus overlap to judge similarity) with known multiple word string 
patterns that trigger certain CFAs. Next, the system will identify the similarities among 
that group of triggers and use them to set triggers for the new word string pattern. 
Moreover, the user may set triggers for strategies for the system to set automatic triggers 
to solve new problems. 

As will be understood by those skilled in the art, the skilled practitioner may 
make many changes in the apparatus and methods described above without departing 
from the spirit and scope of the invention. 
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Appendix A — Knowledge Acquisition Lists 



(Examples with Partial Results) 



^2^^ Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4Bwords 



Concept Mining Results for: watchful eye 
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Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4B words 



Concept Mining Results for: meaningful 



Phrase 
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substantial 
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Score 
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^Q^r Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4B words 



Concept Mining Results for: demo 





Phrase 


Relative 
Score 


m 








trial 


9 


m 






4 


evaluation 


8 








6 


copy 


4 


11 






pdf 


3 


m 


|ev^tioti^op^. ; i 




10 


30-day trial 


3 


Si 




mi 


12 


booklet 


3 


:13*". 

mm 








181 



fjj/^pr Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4Bwords 



Concept Mining Results for: God 
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^Q^r Knowledge Acquisition Engine 

meaningful mach.nes Sample Results Using a Corpus of English: 2.4B words 



Concept Mining Results 





Phrase 


Relative 
Score 


Wm 


1 i IKMMM1 


M 


2 


meeting 


73 








4 


workshop 


40 


SI 


\ II ■■■ 






briefing 


27 


a i 






8 


conferences 


23 




mmm : m P ; : 




10 


conference held 


18 


Bl 






12 


session 


16 


IKS 




Kis9^ 

if J..,.HHk.»:->^BP^ * 



: conference 









15 


congress 


15 










meetings 


13 


m 






19 


event 


13 








21 


committee 








mm* 


23 


course 


10 




^n^^nfe^nc ef^ fl^jf; 




25 


general meeting 


9 
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J*^pr Knowledge Acquisition Engine 

m e an iNCFUL machines Sam P le Results Us ' n 9 a Cor P us of English: 2.4B words 



Concept Mining Results for: arizona 





Phrase 


Relative 
Score 








2 


florida 






^iijrfririiap|HM||^ IBM 




4 


iowa 


42 


mm 






6 


Illinois 


40 








8 


Colorado 


37 


a 






10 


Utah 


32 


(.90 ; 


:« r - z 


32 


12 


American samoa 


31 


arizona 


PR: 











m } 


15 


Pennsylvania 




m 


fee- '1 • • J 




17 


minnesota 








L 2|!iS 1 


19 


kansas 


27 


Ii- 


northfcarol inajjft tlRBl 


;/|||| Jf 


21 


louisiana 


24 


Hi* 


oklahbma *|&v L fflfc 




23 


the 


23 


Q3I 


■Virginia:- 7 ~ 


|j||j^§ 


25 


arkansas 


22 



184 



Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4B words 

Concept Mining Results for: world wide web 




mm mm 



4 www 



35 




y \.<mm 



10 



web site 



12 



14 



16 



18 



site 



company's web 



mam 



Sgmi 



home 



20 



official web 
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Knowledge Acquisition Engine 

Meaningful Machines Sam P le Resufts Usin 9 a Cor P us of English: 2.4B words 

Concept Mining Results for: to analyze 





Phrase 


Relative Score 


m 






2 


to analyse 


12 








4 


to improve 


8 








6 


to evaluate 


7 






mmmm 


8 


to examine 


7 








10 


for 


6 


m 






12 


to use 











15 


to obtain 


5 


si 






17 


to study 


4 


in 






19 


to investigate 


4 


m 






21 


to test 


4 








23 


to offset 


4 


2^ 






25 


to isolate 


4 
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f^^. Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4B words 

Concept Mining Results for: information about 





Phrase 


Relative 
Score 


1/1 








information on 


167 


m 






4 


details on 


63 








6 


details about 


46 








8 


info on 


31 


9jf 






10 


information contact 


25 


a 




1 f .Hi' .1 


12 


details of 


24 




[tfetaTiecJ info^aiion^foTi^ 





li 


Hfr^ation p&^ntact^ ; 




15 


info about 


16 


H 


ji^rmation *aJ[|oy t f aji^f * Jft^ ■< 


mm 


17 


information on any of 


12 


El* 


jfqefail «Bft|| |^^H^ 


1^K12»^| 


19 


information visit 


12 


g| 




El 


21 


financial information about 


11 


Sfe. 






23 


general information about 


9 








25 


information on using 


8 
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^Q^r Knowledge Acquisition Engine 

meaningful mach.nes Sample Results Using a Corpus of English: 2.4B words 

Concept Mining Results for: it is safe to say 





Phrase 


Relative 
Score 






wmm 


2 


it is fair to say 


24 








you will find 




mm 






6 


it's fair to say 


11 








8 


the fact 


10 








10 


it is very important 


9 




iitlislimportantlto'riot^K^ 




also 


8 






1 mm 





fit isifflf^urS^P^* ' WBKH ?J 




15 


it is quite clear 




111 






17 


it would be fair to say 


7 


HI 


£it is^yjous.B^.i JiSBfewii 




19 


we all recognize 


7 


202 






21 


it is significant 










23 


we should remember 


7 


m 




®1 


25 


he will find 


6 


2H 
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f^^r Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4Bwords 



Concept Mining Results 





PhracA 
■ III dac 


Relative 
Score 








2 


largest 


70 






m 


4 


world's largest 


25 


tm 


■spa ^ 




6 


best 


20 




^iM^ft Ml ; itl * H 




8 


oldest 


14 






mi m 


10 


first 


12 








12 


greatest 


8 




natiipn's^leadmg. . jjjjfc. £g£ 





for: country's largest 





jsStTi©^ J. 


11 # 

I . _ < 


15 


few 


7 




j^dJLsJeadmg^ j_3lf 




17 


world's biggest 


7 


■ 






19 


fastest growing 


6 






JL 6 111 


21 


uk's largest 


6 


n 






23 


earliest 


5 
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f^^. Knowledge Acquisition Engine 

meaningful machines Sample Results Using a Corpus of English: 2.4B words 



Concept Mining Results 





Phrase 


Relative 
Score 


m 






2 


chief executive officer 


178 






general manager 


35 


■A. 






6 


founder 


25 










chairman 


24 


14 i 


^director A^j^^jA* A 




10 


co-founder 


16 


1ft* 






12 


general counsel 


12 


ill 


waammmm 





: ceo 









15 


chief financial officer 


11 




^xj^u^ejo^re^t^ij|^ ^ 




17 


vice-president 


10 








19 


coo 


9 


TP 

r20l 






21 


publisher 


9 


9 


E2SBE1S! 




23 


secretary 


6 


1 


Biff ; il i a& ft i 


& IS i cn lipii 
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1^ 



MEANINGFUL MACHINES 



Knowledge Acquisition Engine 

Sample Results Using a Corpus of English: 2.4B words 



Concept Mining Results for: terms and conditions 





Phrase 


Relative 
Score 


1 


terms and conditions 


969 




Suss! IS l£lJt 






conditions 


153 


m 


£e|^j^>^s^| Hf ^ £ £ 




5 


provisions 


83 




* t € nyi Sw * * seryi i c ef "Sp ^W^i 




i 


rules 


58 














9 


requirements 


44 


US 








procedures 


28 


m 







13 


policies 


24 


in 






15 


limitations 


19 








17 


standards 


17 


US 








tos 


16 


iii 


fcntprmationl « 


m 


21 


terms and provisions 


15 








23 


following terms and 
conditions 


14 




ff^e^rWr^^atjons f 1 H 




25 


site terms 


13 
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y^^^ Knowledge Acquisition Engine 

M baninofu l machines Sample Results Using a Corpus of English: 2.4B words 

Concept Mining Results for: rules and regulations 





Phrase 


Relative 
Score 










rules 


61 


m 




mmmm 




guidelines 


28 


k 




IP ™ 


6 


requirements 


23 


17 If 




J*!^ .\T HP 


8 


procedures 


21 




sssxi[iiiaE oil u 


10 


terms 


18 


m 






12 


laws 


17 















15 


criteria 


11 


m 






17 


rules and procedures 


9 








19 


the rules 


8 








21 


policies and procedures 


8 






«F jBB *» 


23 


directions 


7 








25 


laws and regulations 


6 
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y^^^ Knowledge Acquisition Engine 

meaningful machines Sam P ,e Results Usin 9 a Corpus of English: 2.4B words 



Concept Mining Results for: al qaeda 



r 3 » m^mn^m »*mm* 



Phrase 



ai-qaida 



al qaida 



1 .nra 



osama bin laden 



al- qaeda 



Relative 
Score 



25 



10 



al-qa'ida 



wm 4 



them 



&4j 


tiriTemationaJfcivJrt 




15 


worldwide 


11 






17 


al quaeda 


3 


a 


^a1?^idaTteTfonst^SS 




19 


terrorists 


3 


r as 




p— - r ~- ; 




i mmu , . . . 


L . * 4 ! 


jag i 




f * § , - j 






(T ■ f l » ♦ ] 






! : ' % ■ 4 












u. .llT 
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Appendix A — Knowledge Acquisition Lists 



(Example with Full Results) 



Knowledge acquisition results for: it is important to note 





Phrase | Relative Score 






1 
1 


it is important to note 


1 249 






L 


it is fair to say j 16 








it is important | 16 






A 
<\ 


it is clear 1 12 






D 


that 


8 






O 


it is very important 


8 






1 


we all agree 


8 






o 
O 


it is important to recognize j| 8 






Q 

y 


you will find 


I 7 






1 n 

lU 


the fact || 7 






li 


it is a shame 7 






12 


it is safe to say | 7 






13 


it is important to point out 


1 6 






14 


it is interesting to note 


6 






15 


is 


6 







16 


it should be pointed out 


1 6 






17 


we should remember 


6 






18 


it is unfortunate 


5 






19 


[it is quite clear 


1 5 






20 


[he will find 


1 5 






21 


|it f s fair to say 


1 5 






22 


|it can be said 


5 


II 


23 


it is obvious 


1 5 






24 | 


we all recognize 


5 


1 


25 


we have to recognize 


5 






26 


we should bear in mind 


| 5 


! 




27 | 


it is well known 


5 






28 


it shows 


| 5 






29 


we know 


1 5 




30 


he knows 


1 4 






31 


the point is 


1 4 






32 


i can say 


1 4 






33 


it is significant 


1 4 






34 


it is very clear 


1 4 
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I 35 1 


it should be noted j 


4 


1 1 


N 1 


we will find j 


4 




b7 1 


you will see 


! 4 


1 


|38 


we all know | 


4 


1 II 


|39 ] 


we must recognize ] 


4 




|40 1 


of 1 


3 


1 II 


Ui | 


it is time 


1 3 


1 « 


|42 | 


iit is 


1 ' 3 




h3 I 


^ie would agree 


1 3 


1 1 




|44 | 


|it is imperative 


3 






H 


^ou would find 


1 3 






|46 | 


about 


1 3 




|47 | 


everyone would agree 


1 3 




|48 | 


it is important to remember 


1 3 


1 II 


{49 


|we must realize 


1 3 




bo 


(you will agree 


1 3 




|51 


[we have demonstrated 


3 


1 1 


|52 


(you know 


3 


1 


rz 




(53 lyou'll agree 


3 
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54 | 


he said 


3 






55 | 


|i should say 


3 




56 ||it is extremely important 


1 3 


I 


57 


[it is very unfortunate 


1 3 




58 


[it points out 


3 




59 


people understand 


! 3 


1 


60 


[the answer is 


1 3 


II 


61 | 


[we should look at 


1 3 






62 | 


[also 


1 3 




1 


63 


[everyone will agree 


1 3 




64 


[he is aware 


3 




i 


65 


[he meant 


! 3 




I 


66 


|i am hearing 


1 3 






67 


|i have demonstrated 


1 3 






68 


|i said 


3 






69 


|i would have to say 


1 3 






70 


in view of the fact 


1 3 






71 


it is absolutely disgraceful 


3 






72 


it is apparent 


1 3 







73 


fit is great 


1 3 




i 


74 


fit is true to say 


1 3 






75 


fit is very important to recognize 


1 3 






76 


(it should be understood 


3 






77 


fit would be fair to say 


1 3 




i 


78 


jit's important to understand 


I 3 






79 


jit's wonderful 


1 3 




i 


80 


[one of the reasons is 


3 




I 


81 


[sometimes 


3 




i 


82 


{the house will recognize 


3 




i 


83 


(they understand 


3 




i 


84 


Jwe all realize 


3 




i 


85 


|we all understand 


3 




i 


86 


|we can say 


3 




i 


87 


Jwe have to remember 


3 




i 


88 


(you .said 


1 3 




i 


89 


(you'll find 


1 3 




i 






i 




mmmrn 


1 


1 





92 tfli»im only fair i ■«■■ ** g. » | f , .» g « . ..» 4^.|L4^-*2a >i 



WWW** * * f 4 » ♦ + ■*■ 4 »■ ^4 |... # ^ 



foo* llfaysdff *: Iff* * » ^ * « ~ * ■> -»-?TV» *■< ♦ -2 r 



1 



IP 5 wf datef #1 



I 



1X1 



I 



PL 



108 : u win*t#aa^id# > »-»^ 



109 



110 



however * * - 1 * • 



it important 



4 ■■-.r|4 



i * > * ■* +u « , 1 



w=«= 



199 



111- 
















lStffl[gffi^gggilBi^jl iff f < tjKMIlii'I#g|i.li- 



i ir 



Mil 



- ■ 



M§ llthercis noques 



us 




124 jlall Canadians would agree " HP~- 2- ' 



125 all have said 



126 



_i t_ 



"•'"'••'•"f «f< *• .(if *: ,*f * 

A <d» 4~ .A .. iU A a 



www 







128 


all otus feel f ' ' 1 - - f 






129 


all df us in this chamber are committed to the spirit 
of the legislation - i think we should do our best to 


' ' : j ; ■ 



200 



I 

rni 
r~ii 




i am correct in saying .. , „ J 



144 



jl <■» » 



145 



i can safely say 



1 •- ♦ 



51 E! , 



146 



i have outlined 



201 



MM 



I 



i 8 a S fa == a i ! i LiJ a ; • 5 i * s= 



f * f 



M3f Itis impo|taiit»is 






l^litis absolutel y esseptial ; 2 : ~ 



159 , it is also clear / 



1 



Bam 



papriaige to §ay 



<» * - » # 4 



16$ itis lairly otear , 



11 3 ^ : ,, ... 



164 lit is impOTtant to emphasiz^ 



i 



* *2 



» ) 



202 



its 


it is important to indicate 








W 








W\ 


Wi»«putonrecorc 








its 


if is important to say ; " 








ft*" 






1 




|LfD : 


f^tnoreimpoilnt 'M 




1 






it is malicious to think- 








If* 


it^s«ioteworthy.f 5 , 




| 










\ 




[174 


it is quite true 


„ . . II 2 U\, 




1 






















,, 




&m . -i - 'a 2 






178 




M- 'dm- h;MAt • 1 


2 . 









^^^ortfi noting . §/ ^ : s j 








m 




I 2 






m 


itpaistb|evidsitf 


J 1 2 






182 


T ' - . tr. 

it points out the fact 


2 






183 


it says 


I 2 







203 

L 





it should becles 


ft 3<t • h -V 11. ,' : ■ 2 . . * 


1 








iaJ^h^mthishous^P-'^ - 




1 




|186 


it will not be tq£ lon&befiye aliame^anij^iie ft ' 2 * '£ 


| 




|§§y 




| 






it's a travesty : ■"' • --<***> " ■ ■ < * • | 




I 






it's clear 




1 




N 


it'sffllporta^toinotei. j . 1 i 2 .& 


1 




Iftgi ti 


.... . - - - 




1 


1 


1112 *fs possible ' M4M*«HM|- ; t .-n§f #, 2 1 


1 » 








1 1 i 






dfc . A 2- JL jJJ 


1 l l 


fell 


~r "T:2""7 rff- — ; '"""S"" VW — ~"1 
















M 


pur thoughts today must be with these people most 
[effected by the horrors of this war and 


»|$r ^ "<f*s TO , ■ •- -pa 

J ' i 








r**-=* f " — 1 — 2 — ^ 






199 


(people should understand 


: 2 ' 




i 


iooi 


teachers of the christian faith and others should %■ 
agret * ¥ * ? ^ ^ + 








201 


that demonstrates || 2 







204 



202 


ltM#ndi|ate# *■ - % - # i • id 






l 


fe 1 




nr* ar ^ n T ' 




\ 


Wa - 












— „ -r— — 






ir 


|he ansvger was «T * r ! : | r T 


r < !• 7 - r - ■■ 






#7 " 


|&e^M^tsV^p#^o^laresuffiGi^itl^; j§< ■>;{ 
iig#^#J.:. . -. ; * I , 1 1 i «. 






1 








209 




IH^. 








Wlinkts anartistic link 'and 1 ' " r ' ' s '" r |pT **"" " 






211 


&§jitbl§ter deserves credit for having *JN|® • - 












1 II 












:theministerlw0iag^ - t j , | | 2 
















116^ 


the reality ^. J,. ; 4 . £ . 1 , i.. 


U ■ i 21 1 






217 « 


the reason for my fascination with this detail is the 

almost^oVeifwhelming sense i have ^ i ^ 








218 


the reason is fe 1 2 






219 


the reason lawyers aire over-represented in the honse 

i s ^ # - ^ - - . «$ * 


2 
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ill 


'the tiling ^ - r fs — — — y *f ' ' \r 






! 


'A .■ , 








1 


223 


i^«i^#l»1ii^r pstldafwith 
some suggestion • ; „ „ .„ j, # L m 








224 


W^iPtoPgism:' III 2'.--T, I'- J! 






1 




(th&rtfbrl f f -V •* ! * ' - : # 




II 








II 








128 < 


ftfi#have • • . M 




II 


W%MUtmMm^M m-.l & M i L. * ... 


!l . I* I; ' 


1 1 












1 




|32l 


thilis|o m « * f ^ 




1 




^S3 : i 




1 






this underlies the fact , ... , ; .1 


1'"' T 2 ; 


1 








1 




|236. 


we agreed „ .V. * » ■ ■ ,. , & . 


li 


1 




|237 


we are all aware 


-i | 



206 




1242 jlwehave seen *f *: V - # 'f - /# "^2" 



C 



rr^r fr 



|T ? 



...11 ™„ 



,2 




can 



all,- tM ■ 



re have to make sure _ ' v •-• | 2 - • 1 



we have 



51 • hp.^PMP^Mm 



= ...a -ilLA. 



^jy^Tmust remind the government , 



_^™™™™, t ™ — „ — „ — . 



Ml 



— — ^ 



we need %>e6ognize 



we should be clear ' 'HP"" 2 



52 H 



I2I3" 



we should be frank in saying 



we will all agree 



2 



^5" 



we would all agree- f • tf -» 
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236 


we'd all agree f :< | :2 < 








j^hatwe^re saying is . & 4-^ MM ^J , # dft 4" 






w 






1 




|XouAayg;tp remei^eii- 


i -fa 4 i, -1 # A 2 -fr &> 




Ifo^foufealie V w * * ?* 1 f If *1| If >*2 'f'"' * 








jyouwill have to agree; 






1 




tike 4 *'' W " W'-'W" *' * * " , *"---*|- HSr-^-j ; P *p 




1 




¥ - I- % ' 


fe^afc & a .> < Ml 








ft* 1 


R«re ****** 


P rf % <f* ' j ^ 




1 


# 


| is unlikely f ^ m A- I w» Jv li, a 1 «, *. 








we have ^ W" * v ^ ^ " ' %■ ] 




1 




Isomeofyouhave^ueledoji .1 . I. . i ... J.,. 


;f li.l | 




1 




(those are, #M 








' 


|refiave# fale .4, f j; J g, . J 










jit has to do with ^ - #A^s§^ 


J 






271 


|it is lhameful 


; II ' ; 1 ' 






272 








1 


273 


p — : — r- r- • r-- y r "T"~ r " r """"^ T "^11 — ^ ^ — " — 

[the country has really missed out on || 1 






274 


tejgvilisee^ i 
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|daye ensor mentioned something li 



ike 



I 



1 asa-iii 1 



lffielBH (jK^jT to WlcjjTtofo s^picfol" 




in view of : ; - - jj 1 



s sad f ' T ~' T 



si 



IB r 1 T - ' * « :t 1 " f 



fouU be J[on|||ful if we could get what's left of], ^ ? fb*#» 
; african americans and | 



itibecause^e.hatetiaccept 




mm 



Ml 



I al^hatfbesls pro^e~^ yT 



I 



jib 



M am^ca#pe#le ajree with fee president "|T~-1* y -IF" 
there is j 




209 




210 



iter 


it is worth-while to gW 








IV- i 

313 


it's possible that we will realize that 












f$^^f^®0*P , U P %;ajpa^.a^hei 
aarenaline iii tne roorrt- which wasnt hard ! - arm i 










Sa^iWinWs»se 










316 


thy&SSfltoh. CjSfreafues who are d^$Ag40&ifflj$ *M 
with iran should also be concerned fyer m ^ 1 








« .... .1. 


^Be#f the interesting things i#| 


IMfl 


i i 






PI 


^ne^of the^tbings that carte ou£of the npr'is T 










that ha view of ,.; ,jr » i ^ , i .fa 








320.: 


that's 






1 




[321 


iieb^tegdem-W^ 








| 




[322 








1 






|ie^-te^a#4ie^to5sendf^^a|itothe f- 

W^IS dL-t 'jL , IlL i A & 2, 


4% If 






If 4 


opportunity to £ut throu^ 




1 








-^r ! ^m-T!^! 






1 




|326 


we have tfte^ar^of ' * ' ♦ ' -f r *' 




1 




[327 


we will find out 






1 


l 




|328 


what has tp be realized is 


& «*■, 


1!n I 
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~< ^ , 5--. ^ ~_ ^ - 

what it reflects is ^ .,4'/ \/, J 


- 1 t r 






33Q | 










vou need to set over ffe- -ii***-" > - f 1 # J 


- ] 




[SP] 




"i 

J 




— 1 

% 


a lot of the institutional investors are buvirifi ftibut i 1 * 

think**. 4* -* • A 






3*1 




LJ 












Sljfc 1 








337 I 


aljof usrecognize , *• *i;,s^.l# 






.... 








33¥| 








340 


afev#OT^«^em| K< |J| ; ; .| J{.. Ji., M^-.ii 














342 


at jthis point there are only two questions | 


.., r IT 
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[all those present would agn 












jail those4vho*av4feyiew!( 
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(allies ' v *' 


w^f. 1 W w ?f| •' w w w 
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It* , 


P women lhf in |ar erf that lump and ,.• , #|| * . ^1. 
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^lutoni^alrea^be^madeto ? * T '-*| * ■*!«*• f 


i 




|680 


allusion has already been m 


iadet8thi.|act 4 1 


1 4.1, ,.. 


1 
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almost any fair-miiided person will come quicfely i$ 
the conclusion 4, 


t -X -* 

1 

_i _£ 
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atfMtMMfe agree with 4- * # 
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al^oMg^ficagcell # f 4» 








:: IP' 


al|0 that o#e of the bad byproducts of the present 
inflationary situation is - ,4$ - 


I ^ 
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*'- 


also the relationship between Canadian embassies 
ancl ou# workers in some countries like that isf * 
enhanced by virtue of 
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also we have to recognize > r | 


i§ II I 1 14ft #1 
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also you have totalize ; 
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amd was lof King to cover s % " 














r i 


Hi 
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M "T"- ^ w 1 ~$ — * — — — r~ • ■ ■•■ • 

americans ajfa.wfa englishiare go stmdn^ly 
committed to the concept of free speech : 1 


?" r 
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r-*" — -|r- -np ■ t tt- -rap-" 

americans realise & |; , . .1, ,1 j§. 
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know f ,4,: i s, ,j, 4.., 




"TV"" .- 7|fT 
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anflofe -f' • **- # ■* if t *' 1 


1* 1 1 # -| * ; :i 
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and i can only do 
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andy made good use of the area and avoided 


|l ; i 1 
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ps a .4 tell v- 4 -j 
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rfason lhe biU should be supported ir^l » 



1 ' I 1 



TOT^IIntonio is the:goWguf ano^ef- -f 7 ^ 1 ^ 



/ careful reading of the bill will indicate j| 1 \ 



Madependemobserverwouldconcde. €~ |[ "F ■ i\ 
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»• 4 , " , 11 a ^1 ; - 






714 


any number you came up with wouldfdepend oil the 
retailer involved 








715 


any of us who were advocating wage and price 
controls or an incomes policy in 1 974 were quite ... 


f • ~»— •* # 
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— — ^ , r ^ — _ E _ ^ — ^ 

any other indication isjnot fair and is misleading 
Canadians to belief ^ 'f"'f #*Jr*lM 
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— !~-^ rg-> i — ^p-: — f?^: tt ~~w< *T~* w< Iff ~ ~ y — \ 

any reasonable ci|izen|vould recognize 'l \ _][__ 1 
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an£xeasonable citizen would recognize ftij^fa iS£ -i • & , ' 
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any true 








f2#" 
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anyone who approaches this issue with an open 

i^x^fe^ave^toadhiit *; ^ " * > -1 


t • : 

r 1 








anyone who examinesithose-propositidns would! 

accept* ^ */ j:i;; i|r r W| «l # ' IP * 


■r — vv tyrw 1 








a^onlfsvho lias Mad even S feliAse 61t male ar 1 
pifessional study 0^«e#o^ii^ S % § < 






— .jJL 


anyone who has had experience in the past or 

r^ponsibility fownanaging systems which are 


i : 1 
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proposed will agree w - * t j f - 
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anyone^ho|^a4s the published article will agree 


[ ■ # -li 
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arafat is personally responsible for | : 1 






729 


are capable of producing a document to give the 
people hope • • - ><• k §• 4- 


1: 1:. 
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ar&doing a trapendbus job we want more healthy . 








n 
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are more important than u % , 









73| | 


^^gopd^featuimof rt> 








m 1 










mi 


as' a government ageocy § - ill § 




i 




i — i 

7i|t 

i 


t^rfree|oans and their taxable loans in these 
^^ignWan^s and average the foreign tax cre||r 
ofetheprofiton all of those loans so , .§ . 


. - ip... 

' •i ' 1 v 

Sit' , , , fcfe, ■ , . 
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as a member of parliament and as the responsible 
mpister ; t yi ^ - ■ ■ i 


ffM 'ft ? 

, : v ■ ■ =* '■ 








S Sln g *ch* re nft rc eS r T ? IT I f 
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a^meilas glie ofihe ileal ri^Sia 1 ^ stalled td ^ 

%1t- tiftb- a4ftl< 'Jl , i ifc. .>w ;, # A : 








742 


as we get? intofeommittee on this bill we will see f 
quite clearly 








743 


aside from , || 1 








at. ,r : 
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at least * H * #. # • f*< §Uf§ > #1 § 
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, , ? . , „ . n r— — w rf fnt — y*^ — i — t — | 

atl^ast4J>itpfit|ias|odo,spfcifiGallywith j J t 1 ;: 


1 






|t1iae%# dfctdnf i wis af rtet^g^f 




















at the motoent yt>u l^Wyou've gf t t^ssttejc with -- 


if; 
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_ .. , ,_ ■ Y~ V 1- > ..IT"... 

at Oi§ satae ^qae pctftso Jbav^to liQCkfctaii ... # . i 


1 1 . * ' 
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Sms|tiefme^fave%mk% f^^^l^HT # 
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[at lis pOinim# remind ^sMb^^Jl Aft^ S A j 






l3j 


^ . T ~ ■ ^ r— T~ — —W~~ — - "^T*" 

atthis point what i Wl ould say that questionis ; 


















at timesanour export oppprtumtres«fef0jd d m , ^ 
products we|do mot take full advantage of 


1 ■ ;% * 
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|tf jfctefitt#r export opportunities for fo|df«§| . 
m^k. wldo not take f full advantage of the fact;, 


r --?•*••• -|r iff. ^ 
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ba^ oier-#e fa|t #' •# -4 i.--^- ».« 4 15- 4- & . 


1 






- — —r—^— — , _ — ^ — ™^ rj- — "«t>'' y» — ™ " 

back to those times as 




1 






balebai r Wdkd*'be atot'mor^funfeo watctif Menew 
theje was am, oujside chance the fxpc-s could be in 
the playoffs or 


*<. # I H 


1 
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basically * ** -1 * * • ~| * « I .j. 


1 
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m. IbiicalWt mm application^* -,m . ib^M- Id <jk 
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' ? |* |blickrpW*a#id^lti^ecfl^i|viir 
enable^, to lad ifAf . ,|,- X, , JL - 
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1 




7# 1 

- ik 1 


bPdSpa^db§« tallied ltMta$*efNrtfffii!l 
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^^^^^sueiefbi;^. .AM >-:i» ■ 
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769 , *|tb^ter'f5e"r«a^%'-* < ||«--^- ■ *» -•*' * 


t, 4' a r 








Wo^sief yeiflhep^ei#^apt 1 
&(Y^B - "Ml? Ml- * * ^- * ■ i J 








771 


bill -gates and steve ballmer have got to be dancing a 
jig and giving each otlier the high five over | j 










^^^^^^te^^^^^^^e^^o be;dan< 
jig and giving each other the M^h five ovSrtlt 








773 Ibortod-pascalmaClrf hlsmfokeSfy aged J, ||, I'' ! ? 


II 


774 llbgth * jy, a, ^ » 






775 |both gay and straight people need to;know 4$ >■.%«■ 1^ 


II 


7* PihinfteWaVfliSawfewritfen-fsee ** T ^' I 1 ' 


II 


777 ||both nations recognize 


1 : i " 




778 Iboth sides are to blame so we are not trying to pin 


l 
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all theblam#on one side ol the oiier~that we rs 
FouW^Pfi i; 4, h m - 
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bothsidesiuiderstaid ;V i< /*, . M 


Ml . - Jk* iJL,, 
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both the mngster ai&Ufl& , -J- ;.' , Ml 








782 , 


bramalea is a model for - , f j|> , 1 , . 






783, 


briefly outlining ? , |, ,,|- , 


u. . tf : - 








bringsj^oggus J . ,| ; , II J «• 








brucf^apl^irep^entji;. £ | & k .uii 








by a little snooping T you could fmfout ^ 


al, ;.-i*r -Jti-. 






m 


b^l^^^^^ ajg^^8r|^^> give you a booklet 








ll«lWl|«a^Mllf'W ; W ■ I 1 
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W •^^•'ad^^Mipppipt to gost oEfel 
staff » this |iea =T ff T ■ ? 


IP ip |. 
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bythtnatu^fitald ^ % V ^ [f 


IT r 
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by the time " : : *T f ' ' f 








792 


by virtue of || 1 
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byv^ueofiiefact 








794 


came up very directly with 


1 






795 


can be expected to go better based on this motion 
before us because 


1 

1 a 
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can be made aiid as long as i think <V 


1 
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can tills&id isP 


v : :> Iff 1 






798| 

"If 


be on their side express guilt and anger over the 
brrible circumstance that ocurred tliis may bejiard. 

lutiMghtmwF ** 
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799$ 


feaftiada has come^to^ 
realize 


point in time when we hf^e to 
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8oSf 


— — — ^ -5 y "yr: """"" — -r: ■ -* 

canaapias decided for good reason " | || f f"; - 






toff 








802 


Canada has to take 


1 1 ft i 






803 1 

" 1 




rfegedfeAave headquartered 

«, | *fe iff 


ra 

• ••• ,¥ 
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ia^dshouJdbeproi 


ad of 1 fl vj[| 1 | 










testimony to , : ; _ f| ; , jj .1 . J 






806 i 


cana^|fa^fe>yi|] 


iteatly^nt-i,. jft, .Ife .-A 






in 


canadiMs ar^beginn 
problems and are real 


ng to get some grasp of these 

izing • :f. 


j k 






808* 


Canadians are going to ;find it very hard to accept I t 






809 


canadipns are intelligent enough to grasp the 1: ... 
dishonesty an<f the la|| of conwctioii IhpUed^ * 
such a stand and to realize also 
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Canadians are starting to understand 
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811 | gfl^ans can come together on * ,; k % 


I . ■■-4,;-. 




812 (Canadians can find reassurance from 
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813 IcMadians know i! 


1 f 


! 


814 Jcariadians sometimes do not understand 
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815 |(ciimiansunderstariM s ^-f :; - : - v : ' : ''■ M 








816 






padians were surprise, rofind ; 8 „ | * 


817 


Canadians were very clear when we had the turkish 

wlp were not even able to make a coherent claim 
[fogplrsecution ortl^lbblems we had with 


i 
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818 Icanadians will appresei&te very we are fully aware of 


: '■ if; 






v&WRftm wiiiieiiai|H:i \ - ?rM 




II 


&lcanadianswould accept 




pr? 1 

1 1 
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821 Icanadians would dot well to consider is t 1 f 


■ ; - •¥! .■ 
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Li-fa 


yplidians would want to be very chary aboi 
[getting into any discussions on that kind of] 
Ibecause it seems to me 


Proposal 








- 

823 : 


ea&b the customer can still order the items by credit 
[card or take their ?hM#s by sending in the order by 
fenaii;ia hopes , I , -|jj> r M \ ,„ :r % : t 


. - 
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itff 


(certain' studies show " > 
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jcertainly effective s ek education is | 1 




i 


826 


Ic^ainly there is the realization r 
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827 


chainnan greenspan was quite accurate when he 


| f ! 
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said ; ; *'f | 








828 












8|9 


common sense dictates 1 








m 


considSon should also be given to 










831 


consideration should also be given to the fact | 








832 


considefation shoitf "Mso be given to tleii^ 
is about time we recognized 


that it 


l-'i v 






ijflPf.' 


content views would d.ctate 


If • • - ^ 1 .'.m. ■ 






004 


continuity of catftijl&plicit ijlii M. 


1 1 'Ml. 






WT 


credit should be given to the joint committee which 
has laboured on this%iatter and provided a focal 


11: 1 % 

«i. ii4 








■ •••• y- :~?:~~ m - ■" - — ...... — r ^.. 

cross media brands are hard to ! manage for just 


n. 1 ' 4 -f- ' 






837 


crown- prince abdullah should be congratulate 
thanked for putting this on the table as a way 

breakingithrough '|J| % 






_ _ _ 








lah^ihis mostly right when he says « 






deals with a problet; ■ : |f. ; M ' : 










— — - — - ; •>»••••-•*—• - — — *- - - - - - — — 

demonstrate ' " * 










demonstrate the great energy 


1 aL^u . -J — - — 








derives from % 
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development of alternative assistance andlle 
availability of alteiiSlte moneysfor advertising 
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revenue willi1|| of veifgreat importance in te^s of 
ensuring v jf 










didriothapp^ " ^ -ft'" - W'" ' 










845 


directly - is '/ ■ ' ' ; f$*? " 




r i n 
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disturbs me the most is| r . 


- : 


1 .J 


1 


I 
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841? 


disturbs me the most is the fact 
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do noishare " ; ; : ; : ' ' ;| % ' ' i x : iff 








s 
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dr. do&ing indicated and 


1* . /*J . II ! . ■^■ u > 








85#f 


due to 
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months but ofihe last two or three years i have 

i^pme^aw^! /fM - * 











852 


e-hck&Lare nice because . s** , „ fi, , . , II; , ; 


; M; ... 






..nr. ,„M; : 


eaph ifus in fie housfcof co^apns was a little bit 
shocked to discover the reason the leader of the new 
democratic party on may 1 8 got up in the house of 

comirions to demand : ■■ / -ft • 












<each one of «p the family are 

to know '--vl- 'V^i isi 


intelligent enough 










^arly§|. . j|I , : |,i : |§ 


i f i 4 ■ ■ I'-- 1 
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856 


economics is mute on what it is 


4 1 ! i 
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elaborate on 


r i 
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english my friend would say 




' f 
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e ! Q B ally|piMpt see.aspa^ampt ^. J 


■Ms jA 'm , 






■ 




n ■ 
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essentially business interests wanted tp make sur4 


— 






861 


<m a spciaUst : would realize k x % . , , a , j 








,;'|. : ' 
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even he would admit ( f ;£ f 
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even its,managemenyo^ J| rfc3Ti,= J 






IP1 


j^ffllllb^i?posite would know |i ,1. 1 


i ijSi;; <>M:*-f , 






867 


^^^pfl^sfei^r^^n^lre are aware of th -M 










868 


even publishers working here are aware of the fact 








869 


Illi t^^oviSi^lurilWe ife^fpil^ 
lose something because Canadians would begin to 
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even the most ardent prop©hent|>f opjsn go^#rnm$nt 
action agrees,:) . || N JIh§1|| : ||Mi§ 


f y-< r t 
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even the parliamentary secretary who appeal to 
|§e re|^ns«itylrthe*m will agfee f : | 












eyfnyolwill^ant^agE^wif met ti 










m 


e^nyojv.cojAneni.demfJistite. ffe 


.ft * ■ 






874 


evenmafy they're reahzing If 


f ; ■ • § # 
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every country in the world has demonstrate! 


} — w: J ; 
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ev|ly gly' looIS forward tl 'fatherliooWailif 
^Pt chaflce tedo h ,/ 


having 

t *4 

i. m 
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i .r ... 
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878 |^it^iB#0^t»usf reac§y a^es > 


if 
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W Pery member of tWou^«i^ r ^ 
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w — 1 ism 
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m >*- " * 
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every team in this conference can attest to 


f . ' | 
HfcL -*gjk. 












1 




eyeiyooay aere can agree b . ; , , 
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884 ievcrybody in the region knows* **> ■ >* 
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fV . „.,..,,. . - .. ■ ■ • ■ - .. ^ 

ieverybli? is^ncea$ied 4put *t ; Tf? m : ^if 
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Everybody knows : "f T ' T 


? '-' --fl" ^ — 
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»fe j 


[ ' 1 "ft" — - --v-- V"-— " •r^rr:- n? 

?feybMy,teEM M. -!;> * 
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everybody ree/}gni^.Jha^Bae^|tlje#^^ 
Miff fpolvfig any of tfcfse problems is 
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everybody would have to admit 
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894 (everyone agrees i 
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8?§ everyone agrees that 4 . .t 


it.- 
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893 l everyone felt # IV- -J 
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e\retyone has had j ust enough of a taste-sor^ have 
is run not to be willing to accent the government's 



: 1 




lf)7 le^enceis mounting which i^jjcateil „, , |[, I." 1 , ,~E' 



p8 evidencfey j i , , ^ II, .XT 

[909 [evidenced by the fact 1 1 
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|910 experience has shown | _1 
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experiipce has^shown that this matter w|tart ep||> 
wMyfurmlufand 7" 




II 


nil 


explained (fcf 1 . ■ * J** ; r* i 
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explai§ing .iij .iii : -i • .-4J» '• &J 


■v.>;l>.i v& 
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^ 5 ^ rs?T — 

far exceed the perceivedproblems in west 


: 1. 1 I . 
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91$yifew wl argue . ,:L i k\ 









916 

f 1 


first he's got to focus on electoral reform and he's 
[gottoiace ,.iM, i&i- :# 
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first of all if you had people like : • , | 
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[first thk it woUbe weftfbr t»ngre» loofcll 
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jfet we have to recognize , y 
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|firstyfuW#**oghIe :§»• ,1 
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|fQr,h^.|nce^-t« , „»y.. ,. .Aii, ..jd 
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[for me that experience fundamentally demonstrates 
las do I 


1 : -?=- 








for mypart that underlying the committee report 

was i: #-' $ ? pV- "if! 


, 1 # 
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for myself to have at least a fair sense of what the ; 

tarifjfeffects of this proposal would b^likel^to bM| 
one would want tcraear Irom groups sSich as the . 

W>rm^k$^mm^i JiL, .ii. ik 
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particularly ppsetlfy < 










930 


for tSobvilzs reason ^ ** T ^ ^ 








931, 
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for the public this magnificent Building is a : ; 
ipninder <c < iMk . . - , *»4; A.; iJL 
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bommy perspective and^^^strengtheo^, 
me and allows me to continue |orkin| as a* 

f°WS^M, . , !■ ife ...^ j.t ,,, ; ,A; riik -.ait. 
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from our standpoint _ . , ., 










fromkhat:lia>ti^ ni.the house today #J 1 # 
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lfrom%hat» have seen . ■ ; : ' | 1 
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•from what we have seen today 
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jgaryhad sent us a brief rnen&f 
USO# : 1 ;:|. :..| 
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generally § . §j . 1 t ^ , 










generally that the canad^t puMc haj|come to „jl 
understand *4P^ IPf ' *fcw w 1 "' ' * "1 
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asonsknew^' ft 
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gh hl§ don^better stofy Ines w 
but i don't feel 


iththef 4 musketeeT 
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gives an indication of 
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944 



[god used me to show them 



945 



guaranteed 



gfjjO fcas bgffl a ^efaLtpol for ensurfng ~j ' , ~ 



jjxl|h^h;|eno#rloo^is,i^ ; ..M, 



— ■ - - ; 



bs^ghj^lh^ien^aj^ 



g493 ^as$ai4manytirites ; . J , 




-one is -E:. .f - 



'53-'ihavm¥aR 



'^i'w i wi iWi ■ l.rf, 
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95 ; , , .... attlpeallrc^u^p iShild^buld^ 



^:ihaving the police give their cards to the people will 



T-fT — 
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ie -las judgmenlwas consistent witf Wri^ ^} 
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he 



ithme 
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he agrees with us 
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§61 fie aioc^tatlhe^w^id html -for *f^jf-_jjpt- » | 



1^. [the and the present minister of finance violated 
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|64 ; ||ie can aUyigr^^ ijfe.---^* A>- ■ •*■<■- „*M M ■ ■. A* 



965 *jfhe call probably emphasize and underline, 
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966 |jhe can probably emphasize and underline the fact 



gi fes c# re gain d - 
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le could have been modest encktgh jp recognize j 




hejuld hay^y^ ^icke^ v flimgs aM f y^|^ ' jH 



ind,pfma^ 



he dealt in larg e part with 

I ■ tit ^li , 
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le decided 
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he demonstrated 
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he didn't want to leaye me alone - that's 
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he even tied to mount a 
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h efjls ? 
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j982 jlhefocused very briefly on ~|| 1 
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ae forgot about the marketing boards and 




he got through * 
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fee,j» a valid T#int today whence referr#fto ,& ji . g. 1 ^ .... ft 
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[he has deitt wiA 
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jhehls decided 
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Appendix B - 

Example of Translation Using Parallel Text and Overlap 



Attempting to translate (from English to Spanish): 

you can also rename the file and write code that affects the project in order to 
complete the application for information on creating applications 

Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application for information on creating applications 
found in 1 files (took 0.085 Seconds) 



Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application for information on creating 
found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application for information on 
found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application for information 
found in 1 files (took 0.084 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application for 
found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 
in order to complete the application 
found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 

in order to complete the 

found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 

in order to complete 

found in 1 files (took 0.082 Seconds) 

Checking db for: you can also rename the file and write code that affects the project 
in order to 

found in 1 files (took 0.082 Seconds) 
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I 



Checking db for: you can also rename the file and write code that affects the project 
in order 

found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write code that affects the project 
in 

found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write code that affects the project 
found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write code that affects the 
found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write code that affects 
found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write code that 
found in 1 files (took 0.082 Seconds) 



Checking for: you can also rename the file and write code 
found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file and write 
found in 1 files (took 0.083 Seconds) 



Checking db for: you can also rename the file and 
found in 1 files (took 0.082 Seconds) 



Checking db for: you can also rename the file 
found in 1 files (took 0.053 Seconds) 



Checking db for: you can also rename the 
found in 1 files (took 0.048 Seconds) 



Checking db for: you can also rename 
found in 4 files (took 0.047 Seconds) 



Checking db for: you can also 

found in 1000 files (took 0.032 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□Will check 100 files 
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File comparison took 4.865 Seconds. 
Frequency table for: you can also 



'0fi A ■ 

l >\*x ■ 






SaapteOa - ] 


1 


1 docs 


69 times 


(tambien puede 










3 


1 docs 


10 times 


informe 










5 


1 docs 


8 times 


tambien 




SSScsHP 






7 


1 docs 


6 times 


|web | 


HJ 








9 


1 docs 


5 times 


valores 








— JI1.W 


11 


1 docs 


5 times 


un informe 


ill 








13 


1 docs 


5 times 


tambien puede utilizar 




3SH9I 






15 


1 docs 


4 times 


componente de informe 










17 


1 docs 


3 times 


clic 










19 


1 docs 


3 times 


codigo 


m 









Checking db for: can also rename the file and write code that affects the project in 
order to complete the application for information on creating applications 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to complete the application for information on creating 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to complete the application for information on 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to complete the application for information 
found in 1 files (took 0.037 Seconds) 
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Checking db for: can also rename the file and write code that affects the project in 
order to complete the application for 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to complete the application 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 

order to complete the 

found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to complete 

found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order to 

found in 1 files (took 0.580 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
order 

found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project in 
found in 1 files (took 0.038 Seconds) 



Checking db for: can also rename the file and write code that affects the project 
found in 1 files (took 0.037 Seconds) 



Checking db for: can also rename the file and write code that affects the 
found in 1 files (took 0.037 Seconds) 



Checking db for: can also rename the file and write code that affects 
found in 1 files (took 0.037 Seconds) 



Checking db for: can also rename the file and write code that 
found in 1 files (took 0.037 Seconds) 



Checking db for: can also rename the file and write code 
found in 1 files (took 0.040 Seconds) 



Checking db for: can also rename the file and write 
found in 1 files (took 0.039 Seconds) 
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Checking db for: can also rename the file and 
found in 1 files (took 0.037 Seconds) 



Checking db for: can also rename the file 
found in 1 files (took 0.008 Seconds) 



Checking db for: can also rename the 
found in 4 files (took 0.003 Seconds) 



Checking db for: can also rename 
found in 33 files (took 0.002 Seconds) 

□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will check 33 files 



File comparison took 1.774 Seconds. 
Frequency table for: can also rename 









miMm M 


' C , ' ■ ■ ...... 


1 


1 docs 


10 times 


[tambien puede cambiar el nombre de j 


2 


1 docs 


8tffi.es J 


|a continuation 1 J 


3 


1 docs 


7 times 


Ipuede | 


u 


idoclj 


7 times 1 


|iml^ipu^ j & j| 


5 


1 docs 


7 times 


(puede cambiar el nombre de | 


6 , 


I docs2, 


5 times J 


bwfe . :i J2 ... .JL . Jfc,., JUL, ikL~,.^L. J 


7 


1 docs 


5 times 


puede cambiar el nombre de un | 


IJ 


1 docs 


Sjijes j 




9 


1 docs 


4 times 


clic en 


io 


1 docs 




tambiln , : / , , Ji 


11 


1 docs 


4 times 


si hace clic 


12 




4-timesuJ 


to^ieSjRiede cambiar.efca^J^^^^^^^^^,,,^ 


13 


1 docs 


4 times 


clic de nuevo de forma que se pueda editar su nombre e 
introduciendo el nuevo 


14 








15 


1 docs 


4 times 


de nuevo de forma que se pueda editar su nombre e 
introduciendo el nuevo nombre 


16 


1 d0CS; 


r * ^ ; "| 

4 times i 


haciendo clic de nuevo de forma que se pueda editar su nombre e 
introduciendg el ^ .. ^ jj 


17 


1 docs 


3 times 


de una 


18 


1 docs 


3 times 


igualmente ^ .. . * 


19 


1 docs 


3 times 


hace clic en cambiar nombre 


bo 1 


1 docs. , 


3 times 


puede cambiar el ngmtar&de una 
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Potential translations (using overlap) for : you can also rename 



I 








you can alsc^J 


can also rename 


1 

3 






2 


tambien j 


tambien puede 


1 


p&il Bj 




4 


laiiiDicn 
puede | 


puede cambiar el nombre de un 








6 


tambien 
puede 


tambien puede cambiar el nombre de una 








i^li^iJIIaE 











Checking db for: also rename the file and write code that affects the project in order 
to complete the application for information on creating applications 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to complete the application for information on creating 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to complete the application for information on 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to complete the application for information 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to complete the application for 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 

to complete the application 

found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to complete the 

found in 1 files (took 0.038 Seconds) 
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Checking db for: also rename the file and write code that affects the project in order 
to complete 

found in 1 fdes (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
to 

found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in order 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project in 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects the project 
found in 1 files (took 0.040 Seconds) 



Checking db for: also rename the file and write code that affects the 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code that affects 
found in 1 files (took 0.039 Seconds) 



Checking db for: also rename the file and write code that 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write code 
found in 1 files (took 0.038 Seconds) 



Checking db for: also rename the file and write 
found in 1 files (took 0.035 Seconds) 



Checking db for: also rename the file and 
found in 1 files (took 0.034 Seconds) 



Checking db for: also rename the file 
found in 1 files (took 0.007 Seconds) 



Checking db for: also rename the 
found in 4 files (took 0.001 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete the application for information on creating applications 
found in 1 files (took 0.045 Seconds) 
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Checking db for: rename the file and write code that affects the project in order to 
complete the application for information on creating 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete the application for information on 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete the application for information 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete the application for 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 

complete the application 

found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete the 

found in 1 files (took 0.043 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
complete 

found in 1 files (took 0.045 Seconds) 



Checking db for: rename the file and write code that affects the project in order to 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in order 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project in 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the project 
found in 1 files (took 0.044 Seconds) 



Checking db for: rename the file and write code that affects the 
found in 1 files (took 0.043 Seconds) 



Checking db for: rename the file and write code that affects 
found in 1 files (took 0.044 Seconds) 



260 



Checking db for: rename the file and write code that 
found in 1 files (took 0.043 Seconds) 



Checking db for: rename the file and write code 
found in 1 files (took 0.037 Seconds) 



Checking db for: rename the file and write 
found in 1 files (took 0.036 Seconds) 



Checking db for: rename the file and 
found in 3 files (took 0.034 Seconds) 



Checking db for: rename the file 
found in 117 files (took 0.005 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□DO 
□ □□□□□□□Will check 100 files 



File comparison took 5.326 Seconds. 
Frequency table for: rename the file 





ll [ji docs 


|34 times 


|cambie el nombre del archivo 










3 


1 docs 


23 times 


archivo 




wmm 






5 


1 docs 


12 times 


cambiar | 


m 








7 


1 docs 


1 1 times 


nombre | 


m 








9 


1 docs 


1 1 times 


por ejemplo 


m 








ii 


1 docs 


10 times 


archivo de 


Wk 








13 


1 docs 


9 times 


para que 


n 








15 


1 docs 


9 times 


en el explorador de soluciones 


n 








17 


1 docs 


6 times 


pueda 










19 


1 docs 


6 times 


extension 




ttdta§&i 1 




Ns^tijMafei©. .... 
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Potential translations (using overlap) for : you can also rena me the file 



1 








you can also rename 


rename the file 








2 


puede cambiar el nombre de un 


un archivo 



Checking db for: the file and write code that affects the project in order to complete 
the application for information on creating applications 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
the application for information on creating 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
the application for information on 
found in 1 files (took 0.039 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
the application for information 
found in 1 files (took 0.043 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 

the application for 

found in 1 files (took 0.041 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
the application 

found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
the 

found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order to complete 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order to 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project in order 
found in 1 files (took 0.040 Seconds) 
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Checking db for: the file and write code that affects the project in 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the project 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects the 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that affects 
found in 1 files (took 0.040 Seconds) 



Checking db for: the file and write code that 
found in 1 files (took 0.039 Seconds) 



Checking db for: the file and write code 
found in 1 files (took 0.033 Seconds) 



Checking db for: the file and write 
found in 6 files (took 0.031 Seconds) 



Checking db for: the file and 

found in 664 files (took 0.432 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□Will check 100 files 



File comparison took 10.28 Seconds. 
Frequency table for: the file and 





t% 1 




j ' ■ 5 


1 


1 docs 


35 times 


[archivo y 


2 i 








3 


1 docs 


25 times 


|el archivo 




mm 






5 


1 docs 


19 times 


Ipuede 


IB 








7 


1 docs 


13 times 


|archivo de 










9 


1 docs 


1 1 times 


[excel 


Si 






[ife®©^,^. = M A£ ^h, r^lU 4 M 4 n 


11 


1 docs 


10 times 


lla fxmcion antes de guardar el archivo y reemplazara la formula 
|con el valor resultante !j 
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E3 
la 








i o 
13 


1 docs 


10 times 


lexcel calculara la fiincion antes de guardar el archivo y 
I 

[reemplazara la formula con el 


lal 








15 


1 docs 


7 times 


|access 










17 


1 docs 


7 times 


version 










19 


1 docs 


7 times 


un archivo 











Potential translations (using overlap) for : you can also rename t he file and 









i 


vou can also rename the file 


[the file and 








V 


puede cambiar el nombre de un archivo 


larchivo y 








F 


puede cambiar el nombre de un archivo 


[archivo de 



Checking db for: file and write code that affects the project in order to complete the 
application for information on creating applications 
found in 1 files (took 0.012 Seconds) 



Checking db for: file and write code that affects the project in order to complete the 
application for information on creating 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project in order to complete the 
application for information on 
found in 1 files (took 0.011 Seconds) 



Checking db for: file and write code that affects the project in order to complete the 

application for information 

found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project in order to complete the 
application for 

found in 1 files (took 0.011 Seconds) 
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Checking db for: file and write code that affects the project in order to complete the 
application 

found in 1 files (took 0.011 Seconds) 



Checking db for: file and write code that affects the project in order to complete the 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project in order to complete 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project in order to 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project in order 
found in 1 files (took 0.011 Seconds) 



Checking db for: file and write code that affects the project in 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: file and write code that affects the project 
found in 1 files (took 0.011 Seconds) 



Checking db for: file and write code that affects the 
found in 1 files (took 0.011 Seconds) 



Checking db for: file and write code that affects 
found in 1 files (took 0.009 Seconds) 



Checking db for: file and write code that 
found in 1 files (took 0.696 Seconds) 



Checking db for: file and write code 
found in 1 files (took 0.003 Seconds) 



Checking db for: file and write 

found in 14 files (took 0.001 Seconds) 

□ □□□□□□□□□□□□□Will check 14 files 



File comparison took 0.949 Seconds. 
Frequency table for: file and write 









jj&pgflfifi) • ; 


ii 


1 docs 


J|6 times 


j|archivo 


m 






I Sl PI PI PI Ri 


|3 


1 docs 


J|4 times 


||escribir 
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4 ..j 


1 doc^ 


4 tinier 


Hneas restantes ' 


5 


1 docs 


4 times 


la primera parte del archivo y de grabar una parte de este en el 1 
disco 


6 , 






modificar la primera parte/del archivo y de grabar una parte de J 


1 


1 docs 


— 

4 times 


de modificar la nrimera narte del archivo v de erahar una narte de 
este en 


- V 


1 docs|| 


% times f? 


desmij£§ll& modiftcar lainfifnera narte "deVarcWf^i^<^^feS>aru^l 

parte qjpiite 'W "' '/f^" " -tip ' J W | 


9 


1 dor^ 


^ time^t 


en el one <5e cifrara con la clave nne nronorciono 


11| 


1 docfl 


3 times 


nuede escribir pi com an do sipuiente nar,i cartrar en la memoria 


11 


1 docs 


3 times 


puede escribir el comando siguiente para cargar en la memoria 
las 100 Hneas restantes 


If" 


1 docs 


2 times 


disco _. 0 


13 


1 docs 


2 times 


archivo y 


14 


1 docs 


2 times j 


unapartf? W!^ W"'> ' W '^f 


15 


1 docs 


2 times 


siguiente || 


m 




2 times J! 


a^hiv^^ JgSfc.:.-. ,iiL . Jfi£ : .; ; lM: . J 


17 


1 docs 


2 times 


y escribir 


18 


1 docs :} 


2 times li 




19 


1 docs 


2 times 


finalmente 


20 


Idocs 


piinelP 


enundirectoriodeestetipo . ! T?: ' 



Potential translations (using overlap) for : you can also rename the file and write 



you can also rename the file and 



file and write 



rnhi&t puede cambi;ir : «^ 
tomb re de im arehtyb de 



de modii 
*abar ona parte di 





1 archivo ydc 



puede cambiar el nombre de un 
archivo de 



de modificar la primera parte del archivo y de 
grabar una parte de este en 



Checking db for: and write code that affects the project in order to complete the 
application for information on creating applications 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: and write code that affects the project in order to complete the 
application for information on creating 
found in 1 files (took 0.010 Seconds) 
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Checking db for: and write code that affects the project in order to complete the 
application for information on 
found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in order to complete the 

application for information 

found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in order to complete the 
application for 

found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in order to complete the 
application 

found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in order to complete the 
found in 1 files (took 0.012 Seconds) 



Checking db for: and write code that affects the project in order to complete 
found in 1 files (took 0.011 Seconds) 



Checking db for: and write code that affects the project in order to 
found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in order 
found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the project in 
found in 1 files (took 0.011 Seconds) 



Checking db for: and write code that affects the project 
found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects the 
found in 1 files (took 0.010 Seconds) 



Checking db for: and write code that affects 
found in 1 files (took 0.008 Seconds) 



Checking db for: and write code that 
found in 3 files (took 0.008 Seconds) 



Checking db for: and write code 
found in 35 files (took 0.002 Seconds) 

□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will check 35 files 
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File comparison took 2.702 Seconds. 
Frequency table for: and write code 







gggg 




■i^. , 




1 
1 


|1 docs 


i j limes 


codigo 


z 


|i aocsf 




y esafflfo codigo 




1 


|1 docs 


/ umes 


y escribir codigo para 




|i! OOCS 


J IJJLUCo 


controles 





■I- ■ ^ M. :■ ,:-...f ■ -,?v ;T : 


c 

J 


|1 docs 


j iiincs 


cddigo para 


P 


|1 does 


Hr llllivo 


agregar 






7 


|1 docs 


H- UIIICo 


y escribir 


o 
o 


|JL uOCS; 




formulario ... 




Q 


[1 docs 


A tlTYI^C 

H- lllllCb 


el servicio 


in 


m uocsg 


4 times 


yesciiacoi 


Igo Z 


.'•V . : IV ~? 


1 1 


|1 docs 


A j.' 

4 times 


escribir codigo para 


12 


li<*oc#j 


4 times 


hacer doble clic ei§§l 


.A^,. . .w Is. :• . v> . ,. .'■ ■ .': 


13 


|l docs 


4 times 


en el editor de codigo 


14 


|l doci 


4 t imes 


o hac^dobife clic en unj^ ^ , ..^Xi, ...M J1L JsL* . 


15 


|l docs 


4 times 


disenador y escribir codigo en 




|l doc^ 






fescribi 




17 


|l docs 


4 times 


que fae reemplazado automaticamente al crear el proyecto 


l&S 


jl docs- 


4 times 

: ■ If, * 


o hacer dobh 
para cl-event 


l^plic, eh^un control del formulario y escribir codigo 


19 


jl docs 


4 times 


clic en el disenador y escribir codigo en la seccion de 
declaraciones generales de la 


20« 


jib docs 


[ 


en el Ilsefiai 


gryescr 

lactase 


ibii^ddigo en la seccion de declaraciones, ; 



Potential translations (using overlap) for : you can also rename the file and write 
code 



u 


LeftSide \ H Itlli ^HSMMI^I llli iMMIHHIi 






you can also rename the file and write 


and write code 


1 


- *f# • • '■■m i : W XP W> >r ^-rW ' W*'"^pf ■ r-fr- r,f" 
tambien puede cambiar et nonibre deun archivo y 


Codigo 


2 


puede cambiar el nombre de un archivo y 


y escribir 
codigo 


3 




y escribir fl 
codigo para 


4 


puede cambiar el nombre de un archivo y 


y escribir 
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1 




Icodigo para j 


In 






16 


puede cambiar el nombre de un archivo y 


y escribir 








8 


puede cambiar el nombre de un archivo y 


y escriba 
codigo 




t amhienMire^^ arcliivxmielinH^ hm 


e'nTeiFe'dilB^de 






m$m i 


10 


puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en 


en el editor de 
codigo 



Checking db for: write code that affects the project in order to complete the 
application for information on creating applications 
found in 1 files (took 0.018 Seconds) 



Checking db for: write code that affects the project in order to complete the 
application for information on creating 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to complete the 
application for information on 
found in 1 files (took 0.018 Seconds) 



Checking db for: write code that affects the project in order to complete the 
application for information 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to complete the 
application for 

found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to complete the 
application 

found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to complete the 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to complete 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in order to 
found in 1 files (took 0.017 Seconds) 
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Checking db for: write code that affects the project in order 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project in 
found in 1 files (took 0.017 Seconds) 



Checking db for: write code that affects the project 
found in 1 files (took 0.009 Seconds) 



Checking db for: write code that affects the 
found in 1 files (took 0.008 Seconds) 



Checking db for: write code that affects 
found in 1 files (took 0.006 Seconds) 



Checking db for: write code that 
found in 126 files (took 0.005 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□Will check 100 files 



File comparison took 9.389 Seconds. 
Frequency table for: write code that 











i 1 


1 docs 


37 times 


codigo que 






2WRmg§ik 






1 docs 


19 times 


escribir 


Wmm\ 








5 


1 docs 


16 times 


por ejemplo 


m\ 




■nr.- 




7 1 


1 docs 


12 times 


escriba codigo que 










9 1 


1 docs 


9 times 


en una 


al 








ii | 


1 docs 


8 times 


cuando 


ml 


ate© 






13 | 


1 docs 


8 times 


el control 










15 | 


1 docs 


8 times 


la aplicacion 










17 I 


1 docs 


|7 times 


para el 
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|19 


1 docs 1 


7 times 


|elementos j 




— 


mmmm 





Potential translations (using overlap) for : you can also rename the file and write 
code that 











you can also rename the file and write code 


write code 
that 


1 






2 


puede cambiar el nombre de un archivo y escribir codigo 


codigo que 


1 






4 




codigo que 


puede cambiar el nombre de un archivo y escriba codigo 


3 


aSIMSbi-.. ,»;MMm .^..ii**., • . 




6 


puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo 


codigo que 














8 


puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en 


en una 



Checking db for: code that affects the project in order to complete the application 
for information on creating applications 
found in 1 files (took 0.013 Seconds) 



Checking db for: code that affects the project in order to complete the application 
for information on creating 
found in 1 files (took 0.013 Seconds) 



Checking db for: code that affects the project in order to complete the application 

for information on 

found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in order to complete the application 
for information 

found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in order to complete the application 
for 

found in 1 files (took 0.013 Seconds) 
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Checking db for: code that affects the project in order to complete the application 
found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in order to complete the 
found in 1 files (took 0.014 Seconds) 



Checking db for: code that affects the project in order to complete 
found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in order to 
found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in order 
found in 1 files (took 0.012 Seconds) 



Checking db for: code that affects the project in 
found in 1 files (took 0.011 Seconds) 



Checking db for: code that affects the project 
found in 1 files (took 0.003 Seconds) 



Checking db for: code that affects the 
found in 1 files (took 0.002 Seconds) 



Checking db for: code that affects 
found in 1 files (took 0.699 Seconds) 



Checking db for: that affects the project in order to complete the application for 
information on creating applications 
found in 1 files (took 0.056 Seconds) 



Checking db for: that affects the project in order to complete the application for 

information on creating 

found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order to complete the application for 
information on 

found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order to complete the application for 
information 

found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order to complete the application for 
found in 1 files (took 0.055 Seconds) 
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Checking db for: that affects the project in order to complete the application 
found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order to complete the 
found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order to complete 
found in 1 files (took 0.054 Seconds) 



Checking db for: that affects the project in order to 
found in 1 files (took 0.055 Seconds) 



Checking db for: that affects the project in order 
found in 1 files (took 0.01 1 Seconds) 



Checking db for: that affects the project in 
found in 1 files (took 0.010 Seconds) 



Checking db for: that affects the project 
found in 1 files (took 0.002 Seconds) 



Checking db for: that affects the 
found in 27 files (took 0.001 Seconds) 

□ □□□□□□□□□□□□ □□□□□□□□□□□□□□Will check 27 files 



File comparison took 1.895 Seconds. 
Frequency table for: that affects the 



fei£ 


pi r 

ite , 


W w 1 

msm 1 




i 


1 docs 


|7 times 


|que afecta 










3 


1 docs 


|4 times 


[que afecta a 




l^ocsPI 






5 


1 docs 


3 times 


afecta 






mmrnm 




7 


1 docs 


3 times 


los datos 


m 








9 


1 docs 


3 times 


monitor gamma mide el contraste que afecta a los medios tonos 


31 








ii 


1 docs 


3 times 


que le permite cambiar el brillo del monitor sin alterar las luces y 
las sombras 
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12 


1 docs 


3 times 


lerfdr grave que afecta al sistllSl operativo ^ que podria poner en 
peligro los dato$ 


13 


1 docs 


2 times 


lyque 




I docs 4 


2 times 


icambiarel - ,^ | „?3., . 


15 


1 docs 


2 times 


[afecta a la 


w. 


1 docs 


2 times 




17 


1 docs 


2 times 


[informacion 




1 docs 


iKtiies - 




19 


1 docs 


2 times 


Imetodo de envio o la codification de envio para ver si eso afecta 
|a la 








p m&odo de er ^° 0 * a C °^^^^ n ^ env10 para ver s * eso 



Potential translations (using overlap) for : you can also rename the file and write 
code that affects the 











you can also rename the file and write code that 


that 

affects the 


IP 


tambien puede cambiar el n ombre de un archivo y escribir codigo que 




2 


puede cambiar el nombre de un archivo y escribir codigo que 


que afecta 


L 


tambien puede cambiar el nombre de un archivo y escriba codigo que 


MHi 


4 


puede cambiar el nombre de un archivo y escriba codigo que 


que afecta 


5 


tambien puede cambiar el nombre de un archivo de raodificar la 
pritnera parte del archivo v de grabar una parte de fete en el editor de 
cddigoque 


que afecta 


6 


puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo que 


que afecta 


7 


^^ien. puede cambia^el nombre de mi archivo y^i|^cddigc^iM| 


que afecta 
al 


8 


puede cambiar el nombre de un archivo y escribir codigo que 


que afecta 
al 


9 


tambien puede cambiar el Aomb|e;de un archivo y escriba codigo que 


|^«fecta 


10 


puede cambiar el nombre de un archivo y escriba codigo que 


que afecta 
al 




tambien puede cambiar el nombre de un archivo de modifier la 
primera f^rte del archivo y degr^^r una parte de este en creditor de 

iMigo que f if ; • \ m " ' - m : ; f i s ! 


que afecta 


12 


puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo que 


que afecta 
al 


13 


tambien puede cambiar el nombre de un archivo y escribir c6digo que 


que afecta 
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□ 


_ ^ # ,,,, ,, . ,■ . | 




14 


puede cambiar el nombre de un archivo y escribir codigo que 


que afecta 

mm 


m 






16 


puede cambiar el nombre de un archivo y escriba codigo que 


que afecta 
a 








Df 




18 


puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo que 


que afecta 
a 



Checking db for: affects the project in order to complete the application for 
information on creating applications 
found in 1 files (took 0.059 Seconds) 



Checking db for: affects the project in order to complete the application for 

information on creating 

found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete the application for 
information on 

found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete the application for 
information 

found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete the application for 
found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete the application 
found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete the 
found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to complete 
found in 1 files (took 0.058 Seconds) 



Checking db for: affects the project in order to 
found in 1 files (took 0.054 Seconds) 
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Checking db for: affects the project in order 
found in 1 files (took 0.010 Seconds) 



Checking db for: affects the project in 
found in 1 fdes (took 0.008 Seconds) 



Checking db for: affects the project 
found in 2 files (took 0.001 Seconds) 



Checking db for: the project in order to complete the application for information on 

creating applications 

found in 1 files (took 0.099 Seconds) 

Checking db for: the project in order to complete the application for information on 
creating 

found in 1 fdes (took 0.098 Seconds) 



Checking db for: the project in order to complete the application for information on 
found in 1 files (took 0.099 Seconds) 



Checking db for: the project in order to complete the application for information 
found in 1 files (took 0.099 Seconds) 



Checking db for: the project in order to complete the application for 
found in 1 files (took 0.098 Seconds) 



Checking db for: the project in order to complete the application 
found in 1 files (took 0.098 Seconds) 



Checking db for: the project in order to complete the 
found in 1 files (took 0.099 Seconds) 



Checking db for: the project in order to complete 
found in 1 files (took 0.058 Seconds) 



Checking db for: the project in order to 
found in 1 files (took 0.054 Seconds) 



Checking db for: the project in order 
found in 12 files (took 0.010 Seconds) 
□ □□□□□□□□□□□Will check 12 files 



File comparison took 1.033 Seconds. 
Frequency table for: the project in order 
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1 


1 docs 


6 times 


para que este disponible en el cuadro de dialogo seleccionar ! 
elemento en el proyecto 


2 


1 docs 


4 times 


para que este disponible en el cuadro de dialogo 


3 


1 docs 


4 times 


debe agregar al proyecto el archivo que contiene el 


4 i 


1 docs 


2 times 


elarchivo 


5 


1 docs 


2 times 


al proyecto 


a 


mm 


2 times 


debe agregar ^ 


7 


1 docs 


2 times 


el archivo que contiene 


8 


1 docs 


2 times 


debe agregar al proyecto 


9 


1 docs 


2 times 


el archivo que contiene el 


10 


1 docs 


2 times 


el archivo que contiene el icono 


11 


1 docs 


2 times 


debe agregar al proyecto el archivo que contiene 

e_ — e L- — j. a i 


12 


1 docs 


2 times 


para que este disponible en el cuadro de dialogo icono 




1 Hnrc 


0 tlTTIPQ 


al Tvrnvppfn nara nnp p<sfp Hi^nnniHIp pn pi niaHrn Hp Hifllocro 

CLl UlUV t-vlU LfCLLCL UUt WOlV vllo IJflll Lylt/ t»ll tl LiLXVJl W vlw U-ldlv/^W 


34 

■'- '. ■ 

t • 


1 docs 


2'times ■ | 


debe agregar ai proyecto el archivo que contiene el mapa de -bits , , 

ui i \* j • : ' • PI 


15 


1 docs 


2 times 


agregar al proyecto el archivo que contiene el mapa de bits para 

nnp PQfp HiQnnniV>1p pn pi pimHrn Hp Hialno'n 

ULlt UloLJVJlll Ulv till Wl l/li&Ulf VJlldl VJ f-^v^ 


16 


1 docs 


2 times 


el archivo que contiene el icono se debe agregar al proyecto para 

nnp p<5fp Hi^nntiiHfp pn p! piinHrn Hp Hialnco ' ^ - 


17 


1 docs 


2 times 


debe agregar al proyecto el archivo que contiene el contrato de 

lippTifMJi ruirji nnp p^tp Hi<irinrnV>1p pn p! PiifiHrn Hp 

llttllVld IJ&l d UUv L/jLL UldL/VJlllL/lt' til VI vUClUlv 


ll 


.docs 


2 times 


debe agregar al proyecto el archivo que contiene el icono para que 
este disponible en el cuadro de dialogo icono : 


19 


1 docs 


2 times 


archivo que contiene el mapa de bits para que este disponible en el 
cuadro de dialogo seleccionar elemento en el 


20 
L_J 


1 

1 docs 




el archive que contiene el mapa de bits para que este disponible en 



Cannot find overlap, trying something else 



Checking db for: the project in 
found in 181 files (took 0.007 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□ □ Will check 100 files 



File comparison took 8.229 Seconds. 
Frequency table for: the project in 



k- pair i^fet ; tw-w 

r 'item ..:ta«JU . 
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1 


1 docs 


39 times 


proyecto en 


i 


1 docs 


lliHEl 








3 


1 docs 


19 times 


el proyecto 






4 


1 docs 


17 times 


al proyecto Iflpf^/. 




- ; 1 


5 


1 docs 


17 times 


proyecto en el 


6 


1 docs 


15 times 


el proyecto en> ;£a*:: 






7 


1 docs 


12 times 


puede 


8 


1 docs 


12 times 


del proyecto 






9 


1 docs 


1 1 times 


a continuation 




I'docsJH 




en el exploradSilde soluciones 




' r*~>- f - ■ . ■ \ - 

■ ■ r"<' '<->■ . : -ft>t:-.*:' : 


11 


1 docs 


10 times 


al proyecto en 


12 


1 docs 


9 times 


visual 


X 




13 


1 docs 


9 times 


el proyecto en el 


14 


1 docs 


8 tim^sj 


agregar feffiiii . . iiiXI: -« ^ 


15 


1 docs 


8 times 


proyecto de 


IP 


Idriir-" 






1 - 




17 


1 docs 


8 times 


Ide la base de datos de muestra neptuno desde el proyecto de la 
(base de 


18 


1 docs 


8 tinies 


la base de daji^fe muestra neptuno ffesde el proyecto de laibaise 

de datos ^ifc^,, •• - < -i,.^,.- 


19 


1 docs 


8 times 


al proyecto de la base de datos de muestra neptuno desde el 
proyecto de la 


1! 


BIB 




proyecto de la base de daws de 


muestra nepttino 


desde el- ; : ' ; 



Potential translations (using overlap) for : you can also rename the file and write 





UilM ~ ~ ' " "til & 


||you can also rename the file and write code that affects the 


the project in 


jr 


tamhicn puede camb&i^itSinbre de un archivo y escribir codigo : ' 




2 


puede cambiar el nombre de un archivo y escribir codigo que 
afecta al 


al proyecto 


3 


tambicn puede cambiar et noiahrlfpiHi archivo y escrifea eddigo 
que afecta al : -l. • .' : tii • ik t^... 


al proyecto 


4 


puede cambiar el nombre de un archivo y escriba codigo que 
afecta al 


al proyecto 


5 


tambien puede cambiar el nombrpapiii archivo de modmcar la 
prhuera parte del archivo y de grabar una parte de 6ste en el 
editor de eddigo que afecta al 


al proyecto 
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6 


puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al 


al proyecto 


7 


tambien puede cambiar el nombre die un archivo y #ciibir codigo 1 

qgiMtfla. , .. ; it \ d 


a ' 1 
contiuuacMn 


8 


tambien puede cambiar el nombre de un archivo y escriba codigo 
que afecta 


a 

continuacion 


9 


uiwDien pueue camuiar ei nomwre «e un arcnw^upwiiouiiic^r m s 
prinfera parte del aii^Mvo y de grabar una parte 4^$$^ m el 
editor de codigo one afecta \ : ■ ■ J 


1 I 
con tinuaciou J 


10 


tambien puede cambiar el nombre de un archivo y escribir codigo 
que afecta a 


a 

continuacion 


11 


pflp&fcbiar MWSM* de mi arlivo y escribirfpio qne r r$ 

" 1V V»* »» ... . ' , 'a..... . .„,.?„,. . - -,s 


eon dnuacion $ 


12 


tambien puede cambiar el nombre de un archivo y escriba codigo 
que afecta a 


a 

continuacion 


13 






1 A 


tambien puede cambiar el nombre de un archivo de modificar la 
primera parte uei arcnivo y ue graoar una pane ue este en ei 
editor de codigo que afecta a 


a 

continuacion 


15 


pueue camuiar ei oorapre ue un aicnivo ae moctnicar ia primera 

parte del archivo y de grabar wm^^^^^^^^^^^m de :|^| 
codigo oue afecta a : • * z ' •» ■ «r #^^gr w : : v Si 


cdntinuaci6n 

.'■^ ~ - 


16 


tambien puede cambiar el nombre de un archivo y escribir codigo 
aue afecta al 


al proyecto 
en 


17 


puede cambiar el nombre de un. archivo y escribir cocfj^o que , 


Ibroyecto J 


18 


tambien puede cambiar el nombre de un archivo y escriba codigo 
que afecta al 


al proyecto 
en 


19 


puede cambiar el nombre de un archivo y escriba codigo que 


ail proyecto 


20 


tambien puede cambiar el nombre de un archivo de modificar la 

nrimpra narto dpi archivo v do flrahar una narte i\o ostf* on pI 

l J l Ullvl %\ UA1 IV Utl i\ 1 VllI T \J J UV w 1 *l A/ 40 I Ulia | J (1 1 IV UV VfllV VIA VI 

editor de codigo que afecta al 


al proyecto 
en 


21 


pttfeie cambiar el nombre de un archivo de modificar la primera 
pai^del archivo y,^e grabar una parte de &te en el editor de 
codigo q ue afecta at j i 


al proyecto 



Checking db for: project in order to complete the application for information on 

creating applications 

found in 1 fdes (took 0.092 Seconds) 
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Checking db for: project in order to complete the application for information on 
creating 

found in 1 files (took 0.092 Seconds) 



Checking db for: project in order to complete the application for information on 
found in 1 files (took 0.090 Seconds) 



Checking db for: project in order to complete the application for information 
found in 1 files (took 0.091 Seconds) 



Checking db for: project in order to complete the application for 
found in 1 files (took 0.091 Seconds) 



Checking db for: project in order to complete the application 
found in 1 files (took 0.090 Seconds) 



Checking db for: project in order to complete the 
found in 1 files (took 0.089 Seconds) 



Checking db for: project in order to complete 
found in 1 files (took 0.049 Seconds) 



Checking db for: project in order to 
found in 1 files (took 0.044 Seconds) 



Checking db for: project in order 
found in 24 files (took 0.001 Seconds) 

□ □□□□□□□□□□□□□□□□□□□□□□□Will check 24 files 



File comparison took 1.656 Seconds. 
Frequency table for: project in order 









1 1 


1 docs 


|l 2 times 


proyecto 


P \ 








3 


1 docs 


|4 times 


del proyecto 


m 








5 


1 docs 


|4 times 


para que este disponible en el cuadro de dialogo 


m 


Ida 






7 


1 docs 


|3 times 


debe 


m 








9 


1 docs 


|3 times 


un proyecto 
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10 


1 docs 


2 times 


acceso 


11 


1 docs 


2 times 


agregar 


12 






Parapoder ?W W; 1 f : fH? 1 


13 


1 docs 


2 times 


al proyecto 


14 


1 docs 


2 tkgfes 


el proyecto >fc \..>n . JH: 


15 


1 docs 


2 times 


debe agregar 


16 


ims 


2tiiifes 


eli^'deWST" * 'T?" ^ ^ r J~f ' ^ """31 n 


17 


1 docs 


2 times 


el archivo que contiene 


18 






d£bl§agregar al proyecto, , - * *i 4&* * . ^ 


19 


1 docs 


2 times 


el archivo que contiene el 


20 


1 docs 


2 tui|e§ 


el archivo que contiene el icpno ^ \ ! 



Potential translations (using overlap) for : you can also rename the file and write 
code that affects the project in order 







mmmm 




you can also rename the file and write code that affects the project in 


project in 
order 


1 


tambien puede cambiar M^^mbm^M^ mcJa^^ escri|Efec6dig0 que 
afecta al proy^^Vi.., Jy§L : , .. iJlL. ,JBdk \ klmkL, . k 


proyec^v'i 

p$£a i Mi.};, 


2 


puede cambiar el nombre de un archivo y escribir codigo que afecta al 
proyecto 


proyecto 
para 


3 


tambien puede cambiair^^wibre f§g$b ^ rCil ^^^ e ^ r ^ a 


proyecto 


4 


puede cambiar el nombre de un archivo y escriba codigo que afecta al 
proyecto 


proyecto 
para 




tatnbi6n puede cambiar ^|ndmbre dtejun archivo de niodificar la A 
primera parte del archivo y de grabar una parte de este en el editor % 

$%1lmm$:ti't*e af^te at'gfflgreeto . iwf, : *&kr. \Wm 1 -;. wM 


proyecto 

para^fl 


6 


puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de codigo 
que afecta al proyecto 


proyecto 
para 



Checking db for: in order to complete the application for information on creating 
applications 

found in 1 files (took 0.096 Seconds) 



Checking db for: in order to complete the application for information on creating 
found in 1 files (took 0.095 Seconds) 



Checking db for: in order to complete the application for information on 
found in 1 files (took 0.095 Seconds) 
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Checking db for: in order to complete the application for information 
found in 1 files (took 0.095 Seconds) 



Checking db for: in order to complete the application for 
found in 1 files (took 0.094 Seconds) 



Checking db for: in order to complete the application 
found in 1 files (took 0.091 Seconds) 



Checking db for: in order to complete the 
found in 5 files (took 0.090 Seconds) 



Checking db for: in order to complete 
found in 7 files (took 0.053 Seconds) 



Checking db for: in order to 

found in 1000 files (took 0.033 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□Will check 100 files 



File comparison took 7.183 Seconds. 
Frequency table for: in order to 







tessih _j 


i 


1 docs 


25 times 


informe \ 


m 








3 


1 docs 


12 times 


puede 










5 


1 docs 


10 times 


datos 










7 


l docs 


9 times 


los datos 










9 


l docs 


7 times 


un informe 


M 








n 


l docs 


6 times 


valor 


H 








13 


l docs 


6 times 


necesita 


H 








15 


l docs 


6 times 


en su informe 


n 








17 


l docs 


5 times 


crear j 
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m 


1 docs li times ||siel 


19 


1 docs ||5 times ||un valor 


|D 


l»s ffc^ PVttenfr"" 1 



Potential translations (using overlap) for : you can also rename the file and write 
code that affects the project in order to 



you can also rename the file and write code that affects the project in 
order 



in order 
to 



tambien puedo camhiar el n 

afecta al proyecto para^ du 




codigo que 



para 
poder & 



puede cambiar el nombre de un archivo y escribir codigo que afecta al 
proyecto para 



para 
poder 



ia 




al prlyectopara j 




poder 



puede cambiar el nombre de un archivo y escriba codigo que afecta al 
proyecto para 



para 
poder 



bmrntu P|t#de cambiar ; ei nombre de un arcKivo de laodificar la i 
primera parte del archivo y de grabar una parte de fete en el editor de 
codigo que^fectatal projbeto pjrra >; ....... ^ ...L 



jioder 



puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo que 
afecta al proyecto para 



para 
poder 



tamb&ti puede calibiar^l uomftre de lii arcHivo y escribir c6digS que 
afecta aiproyeetoien J,; . , J, ;,. A ^ J . 



puede cambiar el nombre de un archivo y escribir codigo que afecta al 
proyecto en 



en su 
in forme 



$&M*M ^ferecMift .. ' I ... : :n 




su 



10 



puede cambiar el nombre de un archivo y escriba codigo que afecta al 
proyecto en 



en su 
informe 



11 



tambien puede cambia 

prm^rik pirte del mihx 
codigo que afecta al pr< 




y : o^lafea^^a 
to en \j 




ftibri 



12 



puede cambiar el nombre de un archivo de modificar la primera parte 
del archivo y de grabar una parte de este en el editor de codigo que 
afecta al proyecto en 



en su 
informe 



Checking db for: order to complete the application for information on creating 
applications 

found in 1 files (took 0.055 Seconds) 
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Checking db for: order to complete the application for information on creating 
found in 1 files (took 0.053 Seconds) 



Checking db for: order to complete the application for information on 
found in 1 files (took 0.053 Seconds) 



Checking db for: order to complete the application for information 
found in 1 files (took 0.050 Seconds) 



Checking db for: order to complete the application for 
found in 1 files (took 0.048 Seconds) 



Checking db for: order to complete the application 
found in 1 files (took 0.045 Seconds) 



Checking db for: order to complete the 
found in 33 files (took 0.044 Seconds) 

□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will check 33 files 



File comparison took 1.949 Seconds. 
Frequency table for: order to complete the 









L • • 


1 


1 docs 


8 times 


ademas 


M 


3H3H 






3 


1 docs 


7 times 


|completar 


4* 








5 


1 docs 


4 times 


|completar la 


6 ^ 








7 


1 docs 


4 times 


|para poder completar 


m 




ISHi 


|si ;n^ua^^las t c^encial^, ljjBt$i& it 




9 


1 docs 


4 times 


|necesitara saber la contrasena 








vmmsmmmmBammms^m t 


I 


11 

m 


1 docs 


4 times 


|cuando usted u otro usuario intente conectarse a internet 


13 


1 docs 


4 times 


kiecesitara saber la contrasena de administrador para poder * 
|completar el procedimiento siguiente 










I 








15 


1 docs 


3 times 


Ipara poder realizar la 


n 


SESS 1 
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|17 


1 docs ! 


2 times ||si no _J 


|l8 


1 docs! 


2 times llcuandoW ■**;•' 'J* * » * } W 


|!9 


1 docs 


2 times |[a internet ] 









Potential translations (using overlap) for : you can also rename the file and write 
code that affects the project in order to complete the 







HI; ■'- ' 31 




you can also rename the file and write code that affects 
the project in order to 


order to complete the 


• 

1 


Mmbmm puede cambiar ei nombre de ub archivo y 
escribir cddigo que afecta ill proyecto para; ip; 1 


^w« w m 

V ,v„, ^ « - .„• ' ■ 


2 


puede cambiar el nombre de un archivo y escribir 
codigo que afecta al proyecto para 


para poder 


3 


tambien puede cafibiar el Sombre de un aripvo y ^Pj 
i$eriba cddiso q tie afecta ^Ijiroveefafcpajna , £k> * -m 


— i. ,j 


4 


puede cambiar el nombre de un archivo y escriba 
codigo que afecta al proyecto para 


para poder 


i 


|ainbien puede cambiar el nombre de un archivo de 
modificar la primera parte del archivo y de grabar 1 
una parte de Hte en el editor de codigo que afecta al 
Iroyeettf vara ... *L • .... . . . :Mi. 


para pouer . ,i ^ 


6 


puede cambiar el nombre de un archivo de modificar 
la primera parte del archivo y de grabar una parte de 
este en el editor de codigo que afecta al proyecto para 


para poder 




fcibieii puede cambiar el nombre de un archivo VjM 
escribir codigo que afecta al proyecto para [' 1 ' -"ill 


para eoiiipletar la §|| 


8 


puede cambiar el nombre de un archivo y escribir 
codigo que afecta al proyecto para 


para completar la 




escriba codigo qu4 afecta ^proyeetd para.. . Jc; A 


para edp(ipletar» 

— - ^ 


10 


puede cambiar el nombre de un archivo y escriba 
codigo que afecta al proyecto para 


para completar la 


-*« 

:. 4 

Or 


tambien puede cambiar el nombre de un arcfivo de 
ji^CHlifie^Xa.pjri^ra parte del ar ^ttvo y de g rabar 
mm parte de estl§ii el emgm de coldfgo quS^cia a&¥ 
proyecto para f ^ 


paraeompletarla 


12 


puede cambiar el nombre de un archivo de modificar 
la primera parte del archivo y de grabar una parte de 
este en el editor de codigo que afecta al proyecto para 


para completar la 


i 


tambi&t puede cambiar el nombre de un archivo y 
escribir eddigo que afecta al proyecto para 3: 


para poder completar 
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14 



puede cambiar el nombre de un archivo y escribir 
codigo que afecta al proyecto para 



para poder completar 



15 



tnM^iin puede catnbiar el nombtM^^^M 
escriba c6digo que afecta al proyeejgpara 



para podigr^iiipl 




16 



puede cambiar el nombre de un archivo y escriba 
codigo que afecta al proyecto para 



para poder completar 



17 



tamhicn puede c.u^hr.r ,1 nombre de uu archivo d, 1 
moiHktar la priraera parade! archive y de grabar 
una parte de eftfe en el edi^r de $&$M? que alteeta al 



para. 



para poder completar 



18 



puede cambiar el nombre de un archivo de modificar 
la primera parte del archivo y de grabar una parte de 
este en el editor de codigo que afecta al proyecto para 



para poder completar 



15* 



escribir eddtgo que afecta al proyecto para 






ibbiacfeKde 



20 



puede cambiar el nombre de un archivo y escribir 
codigo que afecta al proyecto para 



para finalizar la 
combinacion de 
correspondencia 




22 



puede cambiar el nombre de un archivo y escriba 
codigo que afecta al proyecto para 



para finalizar la 
combinacion de 
correspondencia 



tambien puede cambiar cl nombre de un archivo de 
modificar la primera parte del archivo y de grabar 



124 




e de e|i$ en el 



&ra finalfear la 
^ombinacidu/de 




puede cambiar el nombre de un archivo de modificar 
la primera parte del archivo y de grabar una parte de 
este en el editor de codigo que afecta al proyecto para 



para finalizar la 
combinacion de 
correspondencia 




Checking db for: to complete the application for information on creating 
applications 

found in 1 files (took 0.096 Seconds) 



Checking db for: to complete the application for information on creating 
found in 1 files (took 0.095 Seconds) 



Checking db for: to complete the application for information on 
found in 1 files (took 0.095 Seconds) 
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Checking db for: to complete the application for information 
found in 1 files (took 0.049 Seconds) 



Checking db for: to complete the application for 
found in 1 files (took 0.048 Seconds) 



Checking db for: to complete the application 
found in 4 files (took 0.043 Seconds) 



Checking db for: complete the application for information on creating applications 
found in 1 files (took 0.067 Seconds) 



Checking db for: complete the application for information on creating 
found in 1 files (took 0.070 Seconds) 



Checking db for: complete the application for information on 
found in 1 files (took 0.050 Seconds) 



Checking db for: complete the application for information 
found in 1 files (took 0.005 Seconds) 



Checking db for: complete the application for 
found in 1 files (took 0.004 Seconds) 



Checking db for: complete the application 
found in 4 files (took 0.001 Seconds) 



Checking db for: the application for information on creating applications 
found in 1 files (took 0.067 Seconds) 



Checking db for: the application for information on creating 
found in 1 files (took 0.065 Seconds) 



Checking db for: the application for information on 
found in 1 files (took 0.049 Seconds) 



Checking db for: the application for information 
found in 1 files (took 0.005 Seconds) 



Checking db for: the application for 
found in 74 files (took 0.003 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□□□□□□□□□□□□□□□□□□□□□Will check 74 files 



File comparison took 4.957 Seconds. 
Frequency table for: the application for 
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il 


docM 








iMSM 


1 


1 docs 


26 times 


|la aplicacion 


2, 


1 docs 


22 times 


^aapHcacidnpara ■ 


■ .-k 




3 


1 docs 


12 times 


|de la aplicacion para 


4 


1 does 


1Q times 


buede . i \ j. * 


hi. 




5 


1 docs 


8 times 


|cada 


6 


1 docs 


8 times 


lacontimiaci^i I ■ . 






7 

1 


1 rlnpc 


/ lllll Co 


[mediante el conjunto api awe y el nucleo de pae 


H'.'f-""' 

8 


1 docs 


7 tines J 


Igb de fliemolta fisica restanfes est&i disponibles para qui la 
aplieaMa pueda usarlos como pM&afaM* » JiiL *; »£* <, 


S 


9 


1 docs 


7 times 


de memoria fisica restantes estan disponibles para que la 
aplicacion pueda usarlos como memoria cache 








Ifigficiic enpropiedades^ ' ^ W~ *W 






11 


l docs 


6 times 


los 12 gb de memoria fisica restantes estan disponibles para que la 
aplicacion pueda usarlos 


it 


1 docs 


5 tames 


crear^ £ : > :]- : , jl . ^ 


IT. 

| 


t«w » — ; 


13 


1 docs 


5 times 


en el panel de detalles 




i lies 


4 times 


parajfs JJ. X 1 


15 


1 docs 


4 times 


haga clic en 


16 


1 docs 


4 limes 1 


en la aplicacion F IF T; "H ": " 






17 


1 docs 


4 times 


microsoft windows 


11; 


1 dies 


4 times! 


la apl^acion|§ra iiiue 2 ' "? 


19 


1 docs 


4 times 


microsoft windows notepad y microsoft word 


20 


Idocs 


4 times S 


cfe la^^oI^hagSlic elWl notrlfre de ll aplicacion 




a que 



Potential translations (using overlap) for : you can also rename the file and write 
code that affects the project in order to complete the application for 



you can also rename the file and write code that affects the project 
in order to complete the 



the 

application 
for 



1 



tamhien puede cambiar el nombre de im ai 
quelfecta I pronto para coKipletar la 



t de tin archivo,y^ri|^|4d%^ 



m 



puede cambiar el nombre de un archivo y escribir codigo que 
afecta al proyecto para completar la 



la aplicacion 



tamhien puede cambiar el nombre de un ardpvo y escriba c6digo 
que afecta al proyecto para completar la *jE*y "jg • W- 



puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la 



la apl 



icacion 
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pim® pipfts qm £PdM&© y (tegraflBP'oDB pSte <fo d& <§gd off" 




puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la 




la aplicacion 



puede cambiar el nombre de un archivo y escribir codigo que 
afecta al proyecto para completar la 



la aplicacion 
para 




10 



puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la 



la aplicacion 
para 




12 



puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la 



la aplicacion 
para 



Checking db for: application for information on creating applications 
found in 1 files (took 0.063 Seconds) 



Checking db for: application for information on creating 
found in 1 files (took 0.061 Seconds) 



Checking db for: application for information on 
found in 1 files (took 0.044 Seconds) 



Checking db for: application for information 
found in 7 files (took 0.001 Seconds) 



Checking db for: for information on creating applications 
found in 1 files (took 0.063 Seconds) 



Checking db for: for information on creating 
found in 88 files (took 0.063 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□Will 
check 88 files 



File comparison took 7.270 Seconds. 

Frequency table for: for information on creating 
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M 






- ■ - - . 


1 


1 docs 


59 times 


haga clic en 


%M 


| docs 


31 times 


como crear 1 \ 1 , 






3 


1 docs 


27 times 


informacion 


4 , 


1 docs } 


24 times 


para obtener informacion ,< , i 




' i 


5 


1 docs 


17 times 


vea 


6 i 


1 docs 


16 times 


para obtener ; L £, <%■ . -..^V 






7 


1 docs 


13 times 


informaci6n sobre 


8*1 


I docs 1 


fl timeS 


como crear un S * T 




• b ;| 


9 


1 docs 


1 1 times 


para obtener informacion sobre como crear 


!t)l 


1 docs'" 




informacion acerca de : , 


11 


1 docs 


10 times 


[para obtener informacion acerca de como crear 




1 docs 


9timesJ 


informaci6n acerca de como crear , t 






13 


1 docs 


9 times 


para obtener informacion sobre como crear un j 


14 J 


14ocs 


8 times , 


datos Mtk , , :.Jf/, : . .11.,,.. . Mm-:. 




-i 


15 


1 docs 


8 times 


la creation de 


16 


1 1 

J. UUvo | 




informaci6n sobre como;crear > 






17 


1 docs 


8 times 


para obtener mas informacion sobre como crear un campo 
calculado en una consulta de una base de datos de microsoft 


ml 

■ <" 


|$oc|y 


8 times^ : 


bbtenefm&s ic&tnacioSobre qdjno crearkn camga calcu^do en \ 
tipa consulta de'una base de datos de microsoft aqglbt " JaRL;* 


19 


1 docs 


7 times 


para obtener informacion sobre 


20 j 


Iido<5$| 


6 times 


como > ■ .„ ^||r,, . M, . .Jfe ilL 







Potential translations (using overlap) for : you can also rename the file and write 
code that affects the project in order to complete the application for information on 
creating 





i 1 1! 






you can also rename the file and write code that affects the 
project in order to complete the application for 


for information on 
creating 




tambien puede cambiar el iionihre de ui* Jtrchivo f eseribir 


parapbtener 






2 


puede cambiar el nombre de un archivo y escribir codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion 


3 


tambien puede cambiar e( nombre de un lrehho f escribl 
codigo que afecta al proyecto para compear la aplicacidtt 
para l# -A •* ' il 


para obtener 
informacion ; . 


4 


puede cambiar el nombre de un archivo y escriba codigo 


para obtener 
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que afecta al proyecto para completar la aplicacion para 


informacion 




tambi&n puede cambiar el fiotabre m tm arcnivofle ! 
modificar la p^ier^arte del archivo y ^^^^^w^^. J 
parti de Mte eti fcl edtilor de codlgo qiil afeefi al'prbyiSo 
paraeompletar la aplicacion par;a ? Jb, ; 


informacion 

:• .....JUL*. % 


6 


puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en 
el editor de codigo que afecta al proyecto para completar la 
aplicacion para 


para obtener 
informacion 


■y : . 

1 f 


tambien puede cambiar el nombre de an arftiivo y escMbir 


para obtener . 3 


8 


puede cambiar el nombre de un archivo y escribir codigo 
que afecta al proyecto para completar la aplicacion j>ara 


para obtener 


ui 


tambien puede cambiar el nombre de un archivo y escriba 
codigo que afecta al proydstto para completar ia aplieaeWn 
para j m iJi aii §i itt A J 


para obtener '?•'" 


10 


puede cambiar el nombre de un archivo y escriba codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 




tambien puede I^Mlfir el nombre de un archrvoide "1 
modificarla priltera parte del archivo y ^r^#<i# ' * 1 


|ar^^tej^ *^ 


12 


puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en 
el editor de codigo que afecta al proyecto para completar la 
aplicacion para 


para obtener 


, 

13 


tambien puede earner eltoombre de un archivo y escHbir 
cod qi^ai«K alf^ la aplicacion 


para obtener u 




^mo creail . . J 


1 A 

14 


puede cambiar el nombre de un archivo y escribir codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion sobre 
como crear 


IS 


tambien ptaede cambiar el nombre de an ai?chivo- V escftba , 
codigo que%iecta al proyecto para completar la apucacion 

, i ± — i 1 — 


|ara|btea|r 
iMormacion Sobre ! 
cdmo erear. 


16 


puede cambiar el nombre de un archivo y escriba codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion sobre 

vUlIIU CI C«ti 


17 


tambien puede cambiar el nombre de un archivo de 
modificar la primera |iarte del irchivcfy de%rabtr um 
parte de &ste en el editor de cddigo que afccfa al proyeeto 
para completer la aplicacion para ' f 


para obtenft if 
informacion sobre 

cdmoerear^ ..^ 
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18 


puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en 
el editor de codigo que afecta al proyecto para completar la 
aplicacion para 


para obtener 
informacion sobre 
como crear 


19 

4= 


toi^ett^ed© cambiar el nombre de un archivo y eseribif 
eocligo que atecta al proyecto para completar la aplicacion 

^ — - ^ ■ ■ ^ — — ^ 


para/obtenlp; 
informacion acerca 

de ccliso efggyr „Ji^ 


20 


puede cambiar el nombre de un archivo y escribir codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion acerca 
de como crear 


2 1 


tambien pipde cambiarllnoitflre delttt arelivo ylScriba ? 


para obte&eir 
informacion acerca 
de clmo ereflr . ^ A \ 


22 


puede cambiar el nombre de un archivo y escriba codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion acerca 
de como crear 


23 


m<^5car^ primera^p^^del ardtiJ^ d/|rabar una 
^^^e £sie en ^^^ito^^ cj^igo qu|5 ^fecta al proyecto 


informacidn acerca 

de c^^.s^K 


24 


puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en 
el editor de codigo que afecta al proyecto para completar la 
aplicacion para 


para obtener 
informacion acerca 
de como crear 


I 


tamlen puede cambiar el nombre ttJB ai^o ^ : « 
ctSdigo que afecta al proyecto para completar la aplicacion 

para^ ' ? v '/ ' /< 


como crear u.n 


26 


puede cambiar el nombre de un archivo y escribir codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion sobre 
como crear un 


27 

L 


tambien puede cambiar el nombre <ipfe ; g|hivo >' 
codigo que afecta al proyecto para completar la aplicacion 

para/ , < [< . . . * \Z % 'M 


para f btener v ^ 
informactorfsobr^ ! 


28 


puede cambiar el nombre de un archivo y escriba codigo 
que afecta al proyecto para completar la aplicacion para 


para obtener 
informacion sobre 
como crear un 


29 


tambien puede cambiar fel nombre de tin archivo &$ 
modificar la primera parte del archivo y de grabai^jpia 

|f#ff lp fll? V3**V Cll CI ISilJU-Ui ill* i|ltl? «AVVl«< »I |II tl> CVW -v, 

para completar la aplicacidn para 


para obtener 

como crear un } 


30 


puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en 
el editor de codigo que afecta al proyecto para completar la 


para obtener 
informacion sobre 
como crear un 
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Checking db for: information on creating applications 
found in 1 files (took 0.017 Seconds) 



Checking db for: on creating applications 
found in 1 files (took 0.001 Seconds) 



Checking db for: creating applications 
found in 50 files (took 0.002 Seconds) 

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 
□ □□□Will check 50 files 



File comparison took 2.627 Seconds. 
Frequency table for: creating applications 











p 


1 docs 


29 times 


aplicaciones 










|3 


1 docs 


9 times 


crear aplicaciones 




fill 








1 docs 


6 times 


estructuras de datos 










7 1 


1 docs 


5 times 


microsoft 


m\ 




Stiffly *j 




9 1 


1 docs 


5 times 


aplicaciones en 


a| 


wm i 






ii ! 


1 docs 


4 times 


utilizar 










13 [ 


1 docs 


4 times 


procedimientos 


Hi 




■us * 




15 | 


1 docs 


4 times 


crear aplicaciones en 


wt 




Mim%Lm 




17 | 


1 docs 


4 times 


o bajo la plataforma windows nt 










19 | 


1 docs 


4 times 


aplicaciones que se ejecuten bajo windows 95 o bajo la 


ml 









Potential translations (using overlap) for: you can also rename the file and write 
code that affects the project in order to complete the application for information on 
creating applications 
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m 

111111 






you can also rename the file and write code that affects the project 
in order to complete the application for information on creating 


creating 
applications 


I 


tambitf puede cambiar el nombre deli archlfo y '^ir <pgo 
que afecta al proyecto para completar la aplicacion para obtener 
ijafbrmacion sob|^ ctoicycrea^ > 


crear |; ! 
aplicaciones 


2 


puede cambiar el nombre de un archivo y escribir codigo que 
afecta al proyecto para completar la aplicacion para obtener 
information sobre como crear 


crear 

aplicaciones 


3 


tambien puede cambiar el nombre de un archjvo y escriba codigo 
que afecta al p|Qyeetoj||ra completar la aplicacion para obtener 
^forittalion solfre coriiuS crear 1 " ^ p " : W' ' W : '' r 


, . .............. 

crear '' .y 
Sicaciopjes ; : 


4 


puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion sobre como crear 


crear 

aplicaciones 


5 ' 


tambien puede cambiar el nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de este en el 
editor d| eddigo que afecta al proyecto para completar la 
aplicacion para obtener informacion sobre cdnio crear \ 


aplicaciones 

■ 


6 


puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la aplicacion para 
obtener informacion sobre como crear 


crear 

aplicaciones 




tambien puede cambiar el nombre de un archivo v escribir codigo 




7 


que afecta al proyecto para completar ta aplicacion para obtener 
informacion acerca de como crear 


/ t : ' /{ 


8 


puede cambiar el nombre de un archivo y escribir codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion acerca de como crear 


crear 

aplicaciones 


9 


tambien puede cambiar el nombre de un archivo y* escriba ci^igoi: 
que afecta al proyecto para completar la aplicacion para obtener 
informacion acerca de c6mo crear 


aplicaciones 


10 


puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion acerca de como crear 


crear 

aplicaciones 


11 


tambten puede cambiar ei nombre de un archivo de modificar la 
primera parte del archivo y de grabar una parte de &ste en el 
editor de codigo que afecta al proyecto para completar la 
aplicacion para obtener informacion acerca de como crear 


crear 


12 


puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la aplicacion para 
obtener informacion acerca de como crear 


crear 

aplicaciones 
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13 



tambieu 
que afecta 
informacion sobre como crear 




craft 
apflkrfc 
en ^ 



14 



puede cambiar el n ombre de un archivo y escribir codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion sobre como crear 



crear 

aplicaciones 
en 



i 



ede cambiar 
que afecta al pmjecw P 
intommMM. sabre coibo ejgar 




16 



puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion sobre como crear 



crear 

aplicaciones 
en 



prtaera parte « ailtivof de ^tarwa f arte«e feteend 
edttlftr 4e4ddig^m#^&i^^il p|pyec%^ , ^ 

aplicacidn para obtener iitformaci6n sobre cdmo erear 



aplicaciones 

en P • 



18 



puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la aplicacion para 
obtener informacion sobre como crear 



crear 

aplicaciones 
en 



19 



Paede eamhiar el nombre de un « 
que Afecta al proyecto para completar la apicae* 
mformacidn acerca de como crear 




20 



puede cambiar el nombre de un archivo y escribir codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion acerca de como crear 



aplicaciones 
crear 

aplicaciones 
en 



mil 



tambien pu< 

que afecta al p^dyee 



parfeoi 



rd^^archivoy escribe 
petafla aflcadfm para obfener ; 




ere 
apl 



iciones 



22 



puede cambiar el nombre de un archivo y escriba codigo que 
afecta al proyecto para completar la aplicacion para obtener 
informacion acerca de como crear 



crear 

aplicaciones 
en 



23 




artefiefefeetf; 
codfletiir la 

aplicaci6n para obtener mformacion acerca de cdmo car-gar 



erefr 



24 



puede cambiar el nombre de un archivo de modificar la primera 
parte del archivo y de grabar una parte de este en el editor de 
codigo que afecta al proyecto para completar la aplicacion para 
obtener informacion acerca de como crear 



crear 

aplicaciones 
en 



Translation process complete (took 245.6 seconds) 
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English: you can also rename the file and write code that affects the 
project in order to complete the application for information on creating 
applications 

Spanish: tambien puede cambiar el nombre de un archivo v escribir 
codigo que afecta al provecto para completar la aplicacion para obtener 
informacion sobre como crear aplicaciones 
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Appendix C 

* * -* 

Now searching for "unless we will have a copy" from english to french 
Checking: unless we will have a copy 
db check took 0.269 Seconds 
0 files found ** 
Calling Triangulation 

'unless we will have a copy*, from EN to FR = a moins que nous ayons une 
copie 

'unless we will have a copy', from EN to DE = 'es sei denn wir eine Kopie 
haben' and back to FR its Vest nous que une copie a ? 

'unless we will have a copy', from EN to EL = 'sktoc, av 9a sxovue sva 
avTiypatpo' and back to FR its 'a moins que nous ayons une copie' 

'unless we will have a copy', from EN to ES = 'a menos que tengamos una 
copia' and back to FR its 'a moins que nous ayons une copie' 

'unless we will have a copy', from EN to IT = 'a raeno che abbiamo una copia' 
and back to FR its 'moins que nous avons une copie' 

'unless we will have a copy', from EN to KO-'^EIfe Ar^OI °>^£J-' 
and back to FR its 'Nous qnand il y a une copie la rancune' 

'unless we will have a copy', from EN to NL = 'tenzij wij een exemplaar zullen 
hebben' and back to FR its 'a moins que nous une copie' 

'unless we will have a copy', from EN to PT = *a menos que nos tivermos uma 
copia' and back to FR its 'a moins que nous ayons une copie' 

'unless we will have a copy', from EN to RU - 'Ecjiu mm He oyaeM HivieTk 
Konmo' and back to FR its 'Si nous n'aurons pas une copie' 

The Triangulation process took 12.58 sec. 



Checking "a moins que nous ayons une copie" back to original language. 
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'a moins que nous ayons une copie', from FR to EN = unless we have a copy 

'a moins que nous ayons une copie', from FR to DE = 'es sei denn wir eine 
Kopie haben' and back to EN its 'it is we a copy has' 

'a moins que nous ayons une copie', from FR to EL = 'moins que nous v ayons 
une copie' and back to EN its 'moins que nous y'! ayons une copie' 

'a moins que nous ayons une copie*, from FR to ES = 'a menos que tengamos 
una copia' and back to EN its 'unless we have a copy' 

'a moins que nous ayons une copie', from FR to IT = 'a meno che abbiamo una 
copia' and back to EN its 'less that we have one copy' 

'a moins que nous ayons une copie', from FR to KO = » 8fe fl ^ELlfe 
AI-^OI ' and back to EN its 'Grudge us who are not when it is the copy' 

'a moins que nous ayons une copie', from FR to NL = 'tenzij wij een 
exemplaar hebben' and back to EN its 'unless we have a copy' 

'a moins que nous ayons une copie', from FR to PT — 'a menos que nos 
tivermos uma c6pia' and back to EN its 'unless we have a copy' 

'a moins que nous ayons une copie', from FR to RU - " and back to EN its " 

The Triangulation process took 12.90 sec. 



Checking: unless we will have a 

db check took 0.225 Seconds 

0 files found ** 

Calling Triangulation 

'unless we will have a', from EN to FR 

'unless we will have a', from EN to DE 
FR its 'c'est que nous A a' 

'unless we will have a', from EN to EL 



a moins que nous ayons a 

'es sei denn wir a haben' and back to 

'£kto<; av 0a exodus to a ' back to 
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FR its 'a moins que nous ayons le a' 

'unless we will have a\ from EN to ES = 'a menos que tengamos a f and back to 
FR its 'a moins que nous ayons a' 

'unless we will have a', from EN to IT = 'a meno che abbiamo a' and back to FR 
its 'moins que nous devons' 

'unless we will have a', from EN to KO = '^e.|fe a0| gfe&» and back 

to FR its 'Nous quand il y a un }a{ la rancune' 

'unless we will have a', from EN to NL = 'tenzij wij a zullen hebben' and back 
to FR its 'a moins que nous a' 

'unless we will have a', from EN to PT = 'a menos que nos tivermos a' and back 
to FR its 'a moins que nous ayons' 

'unless we will have a', from EN to RU = 'Ecjih mm He oyaeM HMext a' and 
back to FR its 'Si nous n'aurons pas A' 

The Triangulation process took 12.51 sec. 



Checking: unless we will have 

db check took 0.124 Seconds 

0 files found ** 

Calling Triangulation 

'unless we will have', from EN to FR 

'unless we will have', from EN to DE 
its 'c'est nous a' 

'unless we will have', from EN to EL 
'a moins que nous ayons' 

'unless we will have', from EN to ES 
its 'a moins que nous ayons' 



= a moins que nous ayons 

= f es sei denn wir haben' and back to FR 

= 'ektos av 8a ejcodu-b' and back to FR its 

= *a menos que tengamos' and back to FR 
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'unless we will have', from EN to IT = 'a meno che abbiamo' and back to FR its 
'moins que nous avons' 

'unless we will have', from EN to KO = BP I- 919. 0 gfe and back to FR 
its 'Quand il y a de nous la rancune* 

'unless we will have', from EN to NL = 'tenzij wij zullen hebben' and back to 
FR its 'a moins que nous' 

'unless we will have', from EN to PT = 'a menos que n6s tivermos' and back to 
FRits 'a moins que nous ayons' 

'unless we will have', from EN to RU = 'Ecjih mm He 6y«eM HMeTb* and back to 
FR its 'Si nous n'aurons pas' 

The Triangulation process took 7.314 sec. 



Checking "a moins que nous ayons" back to original language. 

'a moins que nous ayons', from FR to EN = unless we have 

'a moins que nous ayons', from FR to DE = 'es sei denn wir haben' and back to 
EN its 'it is we has' 

'a moins que nous ayons', from FR to EL = 'moins que nous $ ayons' and back 
to EN its 'moins que nous y'! ayons' 

'a moins que nous ayons', from FR to ES = 'a menos que tengamos' and back to 
EN its 'unless we have' 

'a moins que nous ayons', from FR to IT = T a meno che abbiamo' and back to 
EN its 'less that we have* 

'a moins que nous ayons', from FRto KO = ? ufe§[ ^BPF and back 

to EN its 'When there are grudge we who are not' 
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'& moins que nous ay on s\ from FR to NL = 
its 'unless we have' 

'a moins que nous ayons', from FR to PT = 
back to EN its 'unless we have' 

'a moins que nous ayons', from FR to RU = 

The Triangulation process took 12.15 sec. 



'tenzij wij hebben' and back to EN 
'a menos que nos tivermos' and 

" and back to EN its" 



Checking: unless we will 

db check took 0.001 Seconds 
0 files found ** 
Calling Triangulation 

'unless we will', from EN to FR = a moins que nous 

'unless we will', from EN to DE = 'es sei denn wir werden' and back to FR its 
Vest nous devient' 

'unless we will', from EN to EL - 'ekt6<; av' and back to FR its 'a moins que' 

'unless we will', from EN to ES = 'a menos que' and back to FR its 'a moins 
que' 

'unless we will*, from EN to IT = 'a meno che' and back to FR its 'moins que' 

'unless we will', from EN to KO - '^SPf- &b and back to FR its 'La 
rancune oii nous ne sommes pas' 

'unless we will', from EN to NL = 'tenzij wij zullen' and back to FR its 'a moins 
que nous' 

'unless we will', from EN to PT = 'a menos que nos' and back to FR its 'a moins 
que nous' 

'unless we will', from EN to RU = 'Ecjih mm He 6yfleM' and back to FR its 'Si 
nous ne serous pas' 
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The Triangulation process took 10.56 sec. 



Checking "a moins que" back to original language, 
'a moins que', from FR to EN = unless 

'a moins que', from FR to DE = 'es sei denn' and back to EN its 'it is' 

'a moins que', from FR to EL = 'i5 moins que' and back to EN its 'y'! moins que' 

'a moms que', from FR to ES = 'a menos que' and back to EN its 'unless' 

'a moins que', from FR to IT = *a meno che' and back to EN its 'less than' 

'a moins que', from FR to KO = '8fe&' and back to EN its 'The grudge which 
is not' 

'a moins que', from FR to NL = 'tenzij' and back to EN its 'unless' 
'a moins que', from FR to PT = 'a menos que' and back to EN its 'unless' 
•a moins que', from FR to RU = " and back to EN its " 
The Triangulation process took 7.903 sec. 



Checking: unless we 

db check took 0.093 Seconds 
first grep took 2.003 Seconds 
found in 1000 files 

Rule-based translation #1 - A moins que nous 

translated it in 0.702 Seconds 

Rule-based translation #2 = a moins que nous 

translated it in 5.394 Seconds 

999 of 1000 files contain a pair (source and target language). 
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Checking: A moins que uous 

grep in target language took 0.233 Seconds 20 found, 
counting in files took 0.01 8 Seconds 
Found in 16 files. 

File #0 en g/hansard_disc/set_a/a 0/a _0 1 2. 89.eng ~ total words: 1786; 
Locations: 578. french file. 

File #1 eng/ hansard_disc/set_a/a0/a_020.29.e ng - total words: 2004; 
Locations: 760. frenc hiile. 

File #2 eng/hansard_disc /s et_a/a0/a_008.9.eng — total words: 1972; 
Locations: 919. french file. 

File #3 eng/hansar d_d isc/set_a/a0/a_009.24.eng -- total words: 23 19; 
Locations: 953. french file. 

File #4 en g/hansard disc/set_a/a0/a_026.37.eng -- total words: 2320; 
Locations: 1895. trench file. 

File #5 eng/hansard_disc/set_a/aO/a 006.25 .eng - total words: 2285; 
Locations: 1637. french file. 

File #6 en g /hansard_di sc/set_a/a0/a_015.61.eng - total words: 23 14; 
Locations: 236.948. french file. 

File #7 eng/hansard_disc/set_a/aO/a_03 1 . 53 ,eng - total words: 2495; 
Locations: 1446. french file. 

File #8 eng/hansard_disc/set_a/a0/a_01 1 .78, eng ~ total words: 2448; 
Locations: 1470. french file. 

File #9 eng/ha nsard disc/set_a/aQ/a_014.92.eng - total words: 25 1 1 ; 
Locations: 1867. trench file. 

File #10 ene/hansard_disc/set_a/a0/a_014 . 38.eng - total words: 2387; 
Locations: 2098. french file. 

File #1 1 eng/hansard_disc/set_a/a0/a_0 1 7. 82.eng - total words: 2437; 
Locations: 1333. trench file. 

File #12 eng/hansard_disc/set _ a/aQ/a 0 13.1 .eng — total words: 2380; 
Locations: 1638.2213. french file. 

File #13 eng/hansard_disc/set_a/a0/a_029.2 5.eng ~ total words: 2526; 
Locations: 1514. french file. 

File #14 eng/hansard_disc/set_a/a0/a_0 27.42.eng - total words: 2577; 
Locations: 2124. french file. 

File #15 eng/hansard_disc/set_a /a0/a_0 06.93.eng ~ total words: 2621; 
Locations: 2534. french fi le. 
Checking: a moins que nous 
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grep in target language took 0.237 Seconds 20 found, 
counting in files took 0.019 Seconds 
Found in 16 files. 

File #0 eng/hansarddisc/set a/aO/a 012.89.eng total words: 1786; 
Locations: 578. french file. 

File #1 eng/hansard_dis c/ set a/a0/a_020.29.eng total words: 2004; 
Locations: 760.frenc h file. 

File #2 en g/hansard_disc/set_a/a0/a_008. 9. eng - total words: 1972; 
Locations: 919. french fi le. 

File #3 eng/hansard dis c/set a/aO/a 009.24.eng - total words: 2319; 
Locations: 953. french file. 

File #4 eng/harisard_disc/set_a/a0/a_026.37.eng - total words: 2320; 
Locations: 1895. french file. 

File #5 en g/hansard disc/set_a/a0/a 006.25.eng - total words: 2285; 
Locations: 1637. french file. 

File #6 eng/hansard_disc/set_a/a0/a_015.61.eng ~ total words: 23 14; 
Locations: 23 6,948. french file. 

File #7 eng/h ansard disc/set a/aO/a 031.53.eng - total words: 2495; 
Locations: 1446. french file. 

File #8 eng/hansard_disc/set_a/aO/a_0 1 1 .78.eng - total words: 2448; 
Locations: 1470. french file. 

File #9 eng/hansard_disc/set_a/a0/a_0 1 4.92.e ng - total words: 251 1; 
Locations: 1867. frenchfile. 

File #10 eng/hansard_disc/set_a/aO/a 014.38.eng - total words: 2387; 
Locations: 2098. french file. 

File #1 1 eng/hansard_disc/set_a/a0/a_01 7.82.eng -- total words: 2437; 
Locations: 1333. french file. 

File #12 eng/hansard disc/se t a/aO/a 013.1.eng - total words: 2380; 
Locations: 1638.22 13. french file. 

File #13 eng/hansard_disc/set_a/aO/a 029.25.eng -- total words: 2526; 
Locations: 1514. french file. 

File #14 eng/hansard_disc/se t_ a/a0/a^027.42.eng - total words: 2577; 
Locations: 2124. french file. 

File #15 eng/hansard_disc/set_a/a0/a_006.93 . eng - total words: 2621; 
Locations: 2534. french file. 
Last search took 13.44 
*true* 



304 



Frequency table for: unless we 



NoJ 


Appears in # of 
Documents 


English 
count 


French 


1 | 


13 docs 


13 times 


a moins que nous 



Starting to translate ,false,false,french,true,eng,fre 

Trying to translate 

So far I have a good overlap 0 

Checking: we will have a copy 

db check took 0.297 Seconds 

0 files found ** 

Calling Triangulatdon 

'we will have a copy', from EN to FR = nous aurons une copie 

'we will have a copy', from EN to DE = ! wir haben eine Kopie' and back to FR 
its 'nous avons une copie' 

'we will have a copy', from EN to EL - '9a &xoi)u-e eva avxiypacpo' and back to 
FR its 'nous aurons une copie' 

'we will have a copy', from EN to ES = 'tendremos una copia' and back to FR 
its 'nous aurons un copie' 

'we will have a copy', from EN to IT = 'avremo una copia' and back to FR its 
'nous aurons une copie' 

'we wul have a copy', from EN to KO = '^ El ArSOl 30 1 Dp and back 
to FR its 'Nous serons la copie' 

'we will have a copy*, from EN to NL = 'wij zullen een exemplaar hebben' and 
back to FR its 'nous aurons une copie' 

'we will have a copy', from EN to PT = 'nos teremos uma copia' and back to FR 
its 'nous aurons une copie' 

'we will have a copy', from EN to RU = f Mti 6yaeM hmctb Konnio' and back to 
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FR its 'Nous aurons une copie' 

The Triangulation process took 17.77 sec. 



Checking "nous aurons une copie" back to original language. 

'nous aurons une copie', from FR to EN = we will have a copy 

'nous aurons une copie', from FR to DE = 'wir haben eine Kopie' and back to 
EN its 'we have a copy' 

'nous aurons une copie', from FR to EL = 'nous aurons une copie' and back to 
EN its 'nous aurons une copie' 

'nous aurons une copie', from FR to ES = 'tendremos una copia' and back to 
EN its 'we will have one copies' 

'nous aurons une copie', from FR to IT = 'avremo una copia' and back to EN its 
'we will have one copy' 

'nous aurons une copie', from FR to KO = '^dl^ 91m 3*0\Q ArSOI'and 
back to EN its 'The copy which means will be we' 

'nous aurons une copie', from FR to NL = 'wij zullen een exemplaar hebben' 
and back to EN its 'we will have a copy' 

'nous aurons une copie', from FR to PT = 'nos teremos uma copia' and back to 
EN its 'we will have a copy' 

'nous aurons une copie', from FR to RU = " and back to EN its " 



The Triangulation process took 8.645 sec. 
Frequency table for: we will have a copy 



No. 


Appears in # of 
Documents 


English 
count 


French 


1 
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||1 1|20 docs [|9 times ||aous aurons une copie|[ 

English: unless we will have a copy 
French: 

Starting to translate unless we will have a copy,false,false,french,true,eng,fre 
select lang,olang from peanut where lang = 'unless we will have a copy' order 
by langcount desc - 0 

Current string to be translated = unless we will have a copy 
Got Here.... 
What now? true 

1) a moins que nous aurons une copie 

The translation process took 1 17.0 sec. 



307 



Appendix D - 

Example of Translation Using Target Language Flooding and Overlap 




MEANiNCFitL Machines 



Testing Translation 



Enter item: 



^ Check My synonyms Lear n all items (if not cac hed) P Check 



moy-yom in position (if cached) 



Settings: 



□ 



Comparing Bins of item (moy-yom) in position 
Min frame count I 2 Jalso use top synonyms (if item has a bin) 



-hamas anuncio este jueves el fin de su cese d 



English 



iCIean 



Language: 



20 



Min frame count 



Check Bins Pm on Maximum Bins to check for item 



50 



20 



20 



□ 



check for item 



50 



Check ot her s ynonyms from Bins I'm on Maximum Bins to 
_ ^ 

Mm frame coun t I — ~ J 
Jin) 1 5 1 



use top synonyms (from each Bin) I imust recur at least 



if I'm not found on any bins (or no moy-yom) learn with shorter 
signature ^ 



□ 



Check using ca ched signatur es Ch eck my top signatures 
I 1 Mm recur I i 

Check in cont ext (w ith signature) Use t op sig natures 



iMin frame count 

□ 



20 , „ 
Check top syn I .source must be in top I Win frame count 



use top extra results 



40 



found at least 



50 



% only from cache 1 



□ 



Overlapper Min. chunk size 1 iQverlap size 



izeQ 



Max 
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20 I 20 

bins I'm on Top results from each bin 



Edge frame count 



Min. frame count (2 ove rlap) 



Min. frame count (1 overlap) 

4 ! . _ 
— iMin. frame count (3 overlap) I iMax results per bin 
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Pj ^, _ , Spanish 
L " The Zone Source Language: I 



Max 



Use top results 

Dna size, Min I 1 

r«i 

singatures I — ISingature length 

Use collapsing overlap ^ Min count in combination 



repeat (of overlapped parts) I i.JLock Edge ^ 



English 



Target Langu age: 



Pi I 10 
U se ov erlap *-* Show / Use top phrases I IShow / 



10 



Ove rlap s ize (in source 
- Proximity size 



# Of 



Min 



Use web search 



0 



Before web search min missed words 



min Repeated 



s 



After web search min missed words 



min Repeated 



Starting to translate brake and over (hamas anuncio este jueves el fin de su cese del fuego 
con israel) 

{} 

x hamas anuncio este jueves was just translated and returned results 
Number of results = 1000 

Translation for hamas anuncio este jueves took 1.328 
{} 

x hamas anuncio este jueves el was just translated and returned results 
Number of results = 1000 

Translation for hamas anuncio este jueves el took 0.946 

{} 

x hamas anuncio este jueves el fin was just translated and returned results 
Number of results = 1000 

Translation for hamas anuncio este jueves el fin took 1.29 

Skipping anuncio este jueves el (2 < 2) 

x anuncio este jueves el fin was just translated and returned results 
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Number of results = 306 

Translation for an undo este jueves el fin took 0.827 
going to try and overlap this piece with the hashmap 
Pre 3 
Post 4 

Trying to overlap 'hamas anuncio este jueves el fin' , 'anuncio este jueves el fin 1 

(4,null,l) - (306) 
No good source overlap 
Pre 4 
Post 2 

Trying to overlap 'hamas anuncio este jueves el' , 'anuncio este jueves el fin 1 (2,hamas 
anuncio este jueves el fin,l) — (306) 
Got an overlap in source, checking target 
1000-306 

Overlap check for 'hamas anuncio este jueves el' , 'anuncio este jueves el fin' took 0.722 
*** hamas anuncio este jueves el (1000), (306)anunci6 este jueves el fin = 
hamas anuncio este jueves el fin 
1223 -> 0 



Overlappp results for hamas anuncio este jueves el fin 



1) 'hamas announced thursday , the completion* - 85 (Repeated 11 times) (hamas , 
announced thursday the:: announced thursday the completion) 

2) 'hamas , announced thursday the termination' - 85 (Repeated 5 times) (null) 

3) 'hamas announced thursday , the end' - 85 (Repeated 4 times) (hamas , announced 
thursday the: announced thursday the end) 

4) 'hamas , announced thursday the end' - 85 (Repeated 9 times) (null) 

5) 'hamas announced thursday , the termination' - 85 (Repeated 4 times) (hamas , 
announced thursday the:: announced thursday the termination) 

6) 'hamas , announced thursday the completion' - 85 (Repeated 8 times) (null) 

7) 'hamas , announced thursday that the completion' - 80 (Repeated 3 times) (null) 

8) 'hamas announced on thursday , the end 1 - 80 (Repeated 1 times) (hamas , 
announced on thursday the::announced on thursday the end) 

9) 'hamas , announced thursday the end of - 80 (Repeated 8 times) (null) 

10) 'hamas announced thursday , the end of - 80 (Repeated 3 times) (hamas , 
announced thursday the:: announced thursday the end of) 

11) 'of, hamas announced thursday the end' - 80 (Repeated 7 times) (null) 

12) 'that , hamas announced thursday the termination' - 80 (Repeated 3 times) (null) 

13) 'and , hamas announced thursday the end' - 80 (Repeated 10 times) (null) 

14) 'as , hamas announced thursday the termination' - 80 (Repeated 4 times) (null) 

15) 'hamas announced thursday , the termination of - 80 (Repeated 3 times) (hamas , 
announced thursday the:: announced thursday the termination of) 

16) 'hamas , announced thursday the completion of - 80 (Repeated 7 times) (null) 

17) 'of , hamas announced thursday the completion' - 80 (Repeated 4 times) (null) 

18) 'the , hamas announced thursday the completion' - 80 (Repeated 4 times) (null) 



310 



19) 'hamas , announced thursday is the end 1 - 80 (Repeated 2 times) (null) 

20) 'and , hamas announced thursday the termination 1 - 80 (Repeated 6 times) (null) 



Sorted by repetition 



1) thursday announced , the completion - 32 (Score = 65 times) 

2) thursday announced , the completion of - 26 (Score = 60 times) 

3) announced thursday , the completion - 22 (Score = 65 times) 

4) announced thursday , the completion of - 20 (Score = 60 times) 

5) on thursday announced , the completion - 16 (Score = 60 times) 

6) day , hamas announced thursday the end - 15 (Score = 65 times) 

7) thursday announced , the termination - 14 (Score = 65 times) 

8) announced on thursday , the end - 13 (Score = 60 times) 

9) day , hamas announced thursday the completion - 13 (Score = 65 times) 

10) on thursday announced , the completion of - 13 (Score = 55 times) 

1 1) thursday announced , the termination of - 12 (Score = 60 times) 

12) announced on thursday , the completion - 12 (Score = 60 times) 

13) thursday announced , the completion of its - 12 (Score = 55 times) 

14) announced thursday , the completion of its - 12 (Score = 55 times) 

15) announced on , thursday an end - 12 (Score = 50 times) 

16) hamas announced thursday , the completion - 1 1 (Score = 85 times) 

17) they announced , thursday the completion - 1 1 (Score = 60 times) 

18) day , hamas announced thursday the end of - 1 1 (Score = 60 times) 

19) announced on thursday , the end of - 10 (Score = 55 times) 

20) announced on , thursday an end to - 10 (Score = 45 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves' , 'anuncio este jueves el fin* (2,hamas 
anuncio este jueves el fin,l) -- (306) 
Got an overlap in source, checking target 
997 - 306 

Overlap check for ! hamas anuncio este jueves' , 'anuncio este jueves el fin' took 0.958 
*** hamas anuncio este jueves (997), (306)anunci6 este jueves el fin = hamas 
anuncio este jueves el fin 
@@@3169 ->0 

Overlappp results for hamas anuncio este jueves el fin 



1) * hamas announced , thursday the completion' - 85 (Repeated 11 times) (hamas , 
announced thursday:: announced thursday the completion) 

2) 'hamas , announced thursday the termination' - 85 (Repeated 5 times) (null) 

3) 'hamas , announced thursday the completion' - 85 (Repeated 8 times) (null) 

4) 'hamas announced thursday , the completion' - 85 (Repeated 11 times) (null) 

5) 'hamas announced , thursday the termination' - 85 (Repeated 4 times) (hamas , 
announced thursday: announced thursday the termination) 

6) 'hamas announced thursday , the end' - 85 (Repeated 4 times) (null) 
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7) 'hamas , announced thursday the end' - 85 (Repeated 9 times) (null) 

8) 'hamas announced thursday , the termination' - 85 (Repeated 4 times) (null) 

9) 'hamas announced , thursday the end' - 85 (Repeated 4 times) (hamas , announced 
thursday: announced thursday the end) 

10) 'hamas announced on , thursday the completion' - 80 (Repeated 4 times) (hamas , 
announced on thursday: rannounced on thursday the completion) 

11) 'that , hamas announced thursday the termination' - 80 (Repeated 3 times) (null) 

12) 'hamas , announced thursday the completion of - 80 (Repeated 7 times) (null) 

13) 'the , hamas announced thursday the completion' - 80 (Repeated 4 times) (null) 

14) 'hamas , announced thursday in the finale' - 80 (Repeated 3 times) (null) 

15) 'hamas , announced on thursday the end' - 80 (Repeated 6 times) (null) 

16) 'that , hamas announced thursday the completion' - 80 (Repeated 4 times) (null) 

17) 'hamas , announced thursday and end the' - 80 (Repeated 2 times) (null) 

18) 'hamas , announced on thursday the completion' - 80 (Repeated 4 times) (null) 

19) 'the , hamas announced thursday the termination' - 80 (Repeated 4 times) (null) 

20) 'that , hamas announced thursday the end' - 80 (Repeated 7 times) (null) 



Sorted by repetition 



1) announced on , thursday an end - 18 (Score = 50 times) 

2) announced on , thursday the completion - 16 (Score = 60 times) 

3) announced thursday , the completion - 16 (Score = 65 times) 

4) day , hamas announced thursday the end - 15 (Score = 65 times) 

5) announced on , thursday the end - 15 (Score = 60 times) 

6) announced on , thursday completion - 15 (Score = 55 times) 

7) thursday announced , the completion - 14 (Score = 65 times) 

8) announced on , thursday an end to - 13 (Score = 45 times) 

9) day , hamas announced thursday the completion - 13 (Score = 65 times) 

10) announced thursday , the completion of - 13 (Score = 60 times) 

1 1) e announced , thursday the completion - 12 (Score = 45 times) 

12) announced on , thursday the completion of - 1 1 (Score = 55 times) 

13) hamas announced , thursday the completion - 1 1 (Score = 85 times) 

14) announced on , thursday the termination - 1 1 (Score = 60 times) 

15) day , hamas announced thursday the end of - 1 1 (Score = 60 times) 

16) hamas announced thursday , the completion - 1 1 (Score = 85 times) 

17) e announced , thursday the end - 10 (Score = 45 times) 

18) and , hamas announced thursday the end - 10 (Score = 80 times) 

19) hamas announced , thursday the completion of - 10 (Score = 80 times) 

20) announced on thursday , the completion - 10 (Score = 60 times) 

{} 

x anuncio este jueves el fin de was just translated and returned results 
Number of results = 1000 

Translation for anuncio este jueves el fin de took 1.195 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@ Post2@@@ 
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Trying to overlap 'hamas anuncio este jueves el fin f , 'anuncio este jueves el fin de f 

(2,hamas anuncio este jueves el fin de,l) — (1000) 
Got an overlap in source, checking target 
1500 - 1000 

Overlap check for 'hamas anuncio este jueves el fin' , 'anuncio este jueves el fin de' took 
4.251 

*** hamas anuncio este jueves el fin (1500), (1000)anuncio este jueves el fin 
de = hamas anuncio este jueves el fin de 
### 1839 -> 1839 



Overlap results for hamas anuncio este jueves el fin de 



1) hamas announced thursday the , end of - 90 (Repeated 1 times) (hamas announced , 
thursday the end::announced thursday the end of) 

2) hamas announced thursday the , completion of - 90 (Repeated 1 times) (hamas , 
announced thursday the completion:: announced thursday the completion of) 

3) hamas announced thursday the , termination of - 90 (Repeated 1 times) (hamas 
announced thursday , the tennination:: announced thursday the termination of) 

4) hamas announced thursday the end , of its - 85 (Repeated 1 times) (hamas 
announced , thursday the end of: announced thursday the end of its) 

5) hamas announced on thursday the , completion of - 85 (Repeated 1 times) (hamas , 
announced on thursday the completion: announced on thursday the completion of) 

6) hamas announced thursday the completion , of its - 85 (Repeated 1 times) (hamas 
announced thursday , the completion of: announced thursday the completion of its) 

7) hamas announced on thursday the , end of - 85 (Repeated 1 times) (hamas 
announced on , thursday the end::announced on thursday the end of) 

8) hamas announced thursday that completion , of the - 85 (Repeated 1 times) (hamas 
, announced thursday that completion of::announced thursday that completion of the) 

9) hamas announced thursday that by the , end of this - 85 (Repeated 1 times) (hamas 
announced thursday , that by the end::that by the end of this) 

10) hamas announced on thursday the , termination of - 85 (Repeated 1 times) (hamas 
, announced on thursday the termination: :announced on thursday the termination of) 

11) hamas announced thursday the completion , of a - 85 (Repeated 1 times) (hamas 
announced thursday , the completion of: '.announced thursday the completion of a) 

12) hamas announced on thursday the completion , of its - 80 (Repeated 1 times) 
(hamas announced on thursday , the completion of: Thursday the completion of its) 

13) hamas announced on thursday the end , of its - 80 (Repeated 1 times) (hamas 
announced on thursday the , end of: Thursday the end of its) 

14) hamas announced on thursday the completion , of a - 80 (Repeated 1 times) 
(hamas , announced on thursday the completion of:: announced on thursday the 
completion of a) 

15) hamas announced thursday that , completion of - 80 (Repeated 1 times) (hamas , 
announced thursday that completion: :announced thursday that completion of) 

16) hamas announced thursday that at the , end of - 80 (Repeated 2 times) (hamas 
announced thursday , that at the end: Thursday that at the end of) 

17) hamas announced on thursday , completion of - 80 (Repeated 1 times) (hamas 
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announced on , thursday completion: announced on thursday completion of) 

18) thursday announced the completion , of this - 75 (Repeated 15 times) (thursday 
announced , the completion of::announced the completion of this) 

19) thursday announced the end , of this - 75 (Repeated 8 times) (thursday announced , 
the end of: announced the end of this) 

20) hamas announced on thursday completion , of its - 75 (Repeated 1 times) (hamas , 
announced on thursday completion of: announced on thursday completion of its) 



Sorted by repetition 



1) announced thursday the , completion of - 186 (Score = 70 times) 

2) announced thursday the , end of - 135 (Score = 70 times) 

3) announced thursday the , termination of - 98 (Score = 70 times) 

4) thursday announced the , end of - 60 (Score = 70 times) 

5) announced thursday the completion , of its - 58 (Score = 65 times) 

6) announced thursday the completion , of a - 53 (Score = 65 times) 

7) announced thursday the termination , of all - 47 (Score = 50 times) 

8) announced thursday the end , of its - 44 (Score = 65 times) 

9) thursday announced the completion , of the - 43 (Score = 65 times) 

10) on thursday announced the > end of - 42 (Score = 65 times) 

1 1) thursday announced the , completion of - 41 (Score = 70 times) 

12) on thursday announced the , completion of - 37 (Score = 65 times) 

13) thursday announced the completion , of a - 35 (Score = 65 times) 

14) thursday announced the termination , of the - 33 (Score = 65 times) 

15) announced thursday the termination , of 200 - 28 (Score = 50 times) 

16) announced thursday the end , of cash - 28 (Score = 50 times) 

17) announced thursday the end , of major - 28 (Score = 50 times) 

18) announced thursday the end , of fighting - 28 (Score = 50 times) 

19) thursday announced , completion of - 21 (Score = 65 times) 

20) e announced thursday the , completion of - 19 (Score = 50 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el' , 'anuncio este jueves el fin de f 

(2,hamas anuncio este jueves el fin de,l) ~ (1000) 
Got an overlap in source, checking target 
1000 -1000 

Overlap check for 'hamas anuncio este jueves el' , 'anuncio este jueves el fin de' took 
0.979 

*** hamas anuncio este jueves el (1000), (1000)anuncio este jueves el fin de = 
hamas anuncio este jueves el fin de 
@@@ 2205 -> 0 

Overlappp results for hamas anuncio este jueves el fin de 



1) 'hamas announced thursday the , end of - 90 (Repeated 1 times) (null) 

2) 'hamas announced thursday , the end of - 90 (Repeated 3 times) (hamas , 
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announced thursday the::announced thursday the end of) 

3) 'hamas announced thursday , the termination of - 90 (Repeated 3 times) (hamas , 
announced thursday the::announced thursday the termination of) 

4) 'hamas announced thursday the , completion of - 90 (Repeated 1 times) (null) 

5) f hamas announced thursday , the completion of - 90 (Repeated 10 times) (hamas , 
announced thursday the:: announced thursday the completion of) 

6) 'hamas announced thursday the , termination of - 90 (Repeated 1 times) (null) 

7) 'hamas announced on thursday , the completion of - 85 (Repeated 3 times) (hamas 
, announced on thursday the:: announced on thursday the completion of) 

8) 'hamas announced thursday the completion , of its' - 85 (Repeated 1 times) (null) 

9) 'hamas announced thursday , the completion of its' - 85 (Repeated 6 times) (hamas 
, announced thursday the::announced thursday the completion of its) 

10) 'hamas announced thursday that completion , of the' - 85 (Repeated 1 times) 
(null) 

11) 'hamas announced thursday , the completion' - 85 (Repeated 11 times) (hamas , 
announced thursday the::announced thursday the completion) 

12) 'hamas announced thursday , the end' - 85 (Repeated 4 times) (hamas , announced 
thursday the::announced thursday the end) 

13) 'hamas announced thursday the completion , of a' - 85 (Repeated 1 times) (null) 

14) 'hamas announced on thursday , the termination of - 85 (Repeated 2 times) 
(hamas , announced on thursday the: announced on thursday the termination of) 

15) 'hamas announced thursday , the end of its' - 85 (Repeated 2 times) (hamas , 
announced thursday the:: announced thursday the end of its) 

16) 'hamas announced thursday , that completion of the' - 85 (Repeated 2 times) 
(liamas , announced thursday that:: announced thursday that completion of the) 

17) 'hamas announced thursday the end , of its' - 85 (Repeated 1 times) (null) 

18) 'hamas announced on thursday the , completion of - 85 (Repeated 1 times) (null) 

19) 'hamas announced thursday , the termination' - 85 (Repeated 4 times) (hamas , 
announced thursday the: announced thursday the termination) 

20) 'hamas announced on thursday the , end of - 85 (Repeated 7 times) (hamas , 
announced on thursday the end::announced on thursday the end of) 



Sorted by repetition 



1) announced thursday the , end of - 123 (Score = 70 times) 

2) announced thursday the , completion of - 93 (Score = 70 times) 

3) announced thursday the , termination of - 85 (Score = 70 times) 

4) thursday announced the , end of - 41 (Score = 70 times) 

5) thursday announced the completion , of the - 34 (Score = 65 times) 

6) announced thursday the termination , of all - 33 (Score = 50 times) 

7) thursday announced , the completion - 31 (Score = 65 times) 

8) announced thursday the end , of major - 28 (Score = 50 times) 

9) announced thursday the end , of its - 28 (Score = 65 times) 

10) announced thursday the termination , of 200 - 28 (Score = 50 times) 

11) announced thursday the end , of cash - 28 (Score = 50 times) 

12) announced thursday the end , of fighting - 28 (Score = 50 times) 
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13) announced , thursday the - 28 (Score = 45 times) 

14) thursday announced the termination , of the - 25 (Score = 65 times) 

15) thursday announced , the completion of - 25 (Score = 70 times) 

16) on thursday announced the , end of - 25 (Score = 65 times) 

17) announced thursday the completion , of its - 24 (Score = 65 times) 

18) they announced , thursday the - 24 (Score = 40 times) 

19) announced thursday the completion , of a - 24 (Score = 65 times) 

20) announced thursday , the completion - 22 (Score = 65 times) 
@@@Pre2@@@ 

@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves' , 'anuncio este jueves el fin de 1 (2,hamas 
anuncio este jueves el fin de,l) -- (1000) 
Got an overlap in source, checking target 
997-1000 

Overlap check for 'hamas anuncio este jueves' , 'anuncio este jueves el fin de' took 1.358 
*** hamas anuncio este jueves (997), (lOOO)anuncio este jueves el fin de = 
hamas anuncio este jueves el fin de 
@@@ 4950 -> 0 

Overlappp results for hamas anuncio este jueves el fin de 



1) 'hamas announced thursday the , end of - 90 (Repeated 1 times) (null) 

2) 'hamas announced thursday , the end of - 90 (Repeated 3 times) (null) 

3) 'hamas announced , thursday the end of - 90 (Repeated 3 times) (hamas , 
announced thursday:: announced thursday the end of) 

4) 'hamas announced thursday , the termination of - 90 (Repeated 3 times) (null) 

5) 'hamas announced thursday the , completion of - 90 (Repeated 1 times) (null) 

6) 'hamas announced , thursday the completion of - 90 (Repeated 10 times) (hamas , 
announced thursday: announced thursday the completion of) 

7) 'hamas announced thursday , the completion of - 90 (Repeated 10 times) (null) 

8) 'hamas announced , thursday the termination of - 90 (Repeated 3 times) (hamas , 
announced thursday: :announced thursday the termination of) 

9) 'hamas announced thursday the , termination of - 90 (Repeated 1 times) (null) 

10) 'hamas announced , thursday the completion' - 85 (Repeated 11 times) (hamas , 
announced thursday:: announced thursday the completion) 

11) 'hamas announced on thursday , the completion of - 85 (Repeated 3 times) (null) 

12) 'hamas announced thursday the completion , of its' - 85 (Repeated 1 times) (null) 

13) 'hamas announced thursday , the completion of its' - 85 (Repeated 6 times) (null) 

14) 'hamas announced thursday that completion , of the' - 85 (Repeated 1 times) 
(null) 

15) 'hamas announced thursday , the completion' - 85 (Repeated 11 times) (null) 

16) 'hamas announced , thursday the termination' - 85 (Repeated 4 times) (hamas , 
announced thursday:: announced thursday the termination) 

17) 'hamas announced thursday , the end' - 85 (Repeated 4 times) (null) 

18) 'hamas announced thursday the completion , of a' - 85 (Repeated 1 times) (null) 

19) 'hamas announced on , thursday the end of - 85 (Repeated 6 times) (hamas , 
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announced on thursday:: announced on thursday the end of) 

20) 'hamas announced on thursday , the termination of - 85 (Repeated 2 times) (null) 



Sorted by repetition 



1) announced , thursday the - 431 (Score = 45 times) 

2) announced thursday the , completion of - 93 (Score = 70 times) 

3) announced thursday the , end of - 66 (Score = 70 times) 

4) announced thursday the , termination of - 47 (Score = 70 times) 

5) hamas announced , thursday the - 41 (Score = 65 times) 

6) thursday , announced the - 38 (Score = 45 times) 

7) announced thursday the end , of its - 27 (Score = 65 times) 

8) announced thursday , the completion - 24 (Score = 65 times) 

9) announced thursday the completion , of its - 24 (Score = 65 times) 

10) thursday announced , the completion - 23 (Score = 65 times) 

11) announced thursday , that completion - 23 (Score = 55 times) 

12) announced thursday the completion , of a - 22 (Score = 65 times) 

13) announced thursday , the completion of - 21 (Score = 70 times) 

14) announced thursday , that completion of - 21 (Score = 60 times) 

15) announced thursday , that completion of the - 19 (Score = 65 times) 

16) announced on , thursday the end - 19 (Score = 60 times) 

17) thursday announced , the completion of - 18 (Score = 70 times) 

18) announced on , thursday the completion - 17 (Score = 60 times) 

19) thursday announced the completion , of the - 16 (Score = 65 times) 

20) announced on , thursday completion - 16 (Score = 55 times) 

Skipping este jueves el fin (2 < 2) 
Skipping este jueves el fin de (2 < 2) 
Skipping este jueves el fin de su (2 < 2) 
Skipping jueves el fin de (2 < 2) 

Skipping jueves el fin de su (2 < 2) 

. {} 

x jueves el fin de su cese was just translated and returned results 
Number of results = 998 

Translation for jueves el fin de su cese took 1.205 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de' , Jueves el fin de su cese' 
(2,hamas anuncio este jueves el fin de su cese 5 3) — (998) 
Got an overlap in source, checking target 
1500 - 998 

Overlap check for 'hamas anuncio este jueves el fin de 1 , 'jueves el fin de su cese' took 
1.705 

*** hamas anuncio este jueves el fin de (1500), (998)jueves el fin de su cese = 
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hamas anuncio este jueves el fin de su cese 
### 1235 -> 1235 



Overlap results for hamas anuncio este jueves el fin de su cese 



1) hamas announced thursday the termination , of cease - 1 10 (Repeated 3 times) 
(hamas announced thursday the , termination of: Thursday the termination of cease) 

2) hamas announced thursday the end , of cease - 1 10 (Repeated 2 times) (hamas 
announced , thursday the end of: Thursday the end of cease) 

3) hamas announced thursday the completion , of cease - 1 10 (Repeated 2 times) 
(hamas announced thursday the , completion of: Thursday the completion of cease) 

4) hamas announced on thursday the end , of cease - 105 (Repeated 2 times) (hamas 
announced on thursday the , end of: Thursday the end of cease) 

5) hamas announced thursday the termination , of cease and - 105 (Repeated 2 times) 
(liamas announced thursday the , termination of: Thursday the termination of cease and) 

6) hamas announced thursday the end , of the cease - 105 (Repeated 3 times) (hamas 
announced , thursday the end of: Thursday the end of the cease) 

7) hamas announced on thursday the termination , of cease - 105 (Repeated 3 times) 
(hamas announced on thursday , the termination of: Thursday the termination of cease) 

8) hamas announced on thursday the completion , of cease - 105 (Repeated 2 times) 
(hamas announced on thursday , the completion of: Thursday the completion of cease) 

9) hamas announced on thursday the termination , of cease and - 100 (Repeated 2 
times) (hamas announced on thursday , the termination of: Thursday the termination of 
cease and) 

10) hamas announced on thursday completion , of cease - 100 (Repeated 2 times) 
(hamas announced on thursday , completion of: Thursday completion of cease) 

1 1) hamas announced on thursday the end , of the cease - 100 (Repeated 3 times) 
(hamas announced on thursday the , end of: Thursday the end of the cease) 

12) hamas announced thursday the end of, its unilateral cease - 95 (Repeated 2 
times) (hamas announced thursday , the end of its:: thursday the end of its unilateral 
cease) 

13) hamas announced thursday the successful completion , of cease - 90 (Repeated 1 
times) (hamas announced thursday the successful , completion of: Thursday the successful 
completion of cease) 

14) hamas announced thursday the , end of - 90 (Repeated 1 times) (hamas announced 
, thursday the end: Thursday the end of) 

15) hamas announced on thursday the end of , its unilateral cease - 90 (Repeated 2 
times) (hamas announced on thursday the end , of its: Thursday the end of its unilateral 
cease) 

16) announced thursday the completion , of cease - 90 (Repeated 94 times) (announced 
thursday , the completion of: Thursday the completion of cease) 

17) hamas announced thursday the end , of cease fire - 90 (Repeated 1 times) (hamas 
announced , thursday the end of: Thursday the end of cease fire) 

18) announced thursday the end , of cease - 90 (Repeated 94 times) (announced 
thursday the , end of: Thursday the end of cease) 

19) announced thursday the termination , of cease - 90 (Repeated 141 times) 
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(announced thursday , the termination of: Thursday the termination of cease) 

20) hamas announced thursday the completion , of cease project - 90 (Repeated 1 

times) (hamas announced thursday the , completion of::thursday the completion of cease 

project) 



Sorted by repetition 



1) announced thursday the end , of the - 188 (Score = 65 times) 

2) announced thursday the termination , of cease - 141 (Score = 90 times) 

3) announced thursday the end , of the cease - 141 (Score = 85 times) 

4) announced thursday the termination , of cease and - 94 (Score = 85 times) 

5) announced thursday the end of , its unilateral cease - 94 (Score = 75 times) 

6) announced thursday the end , of the cease fire - 94 (Score = 65 times) 

7) announced thursday the completion , of cease - 94 (Score = 90 times) 

8) announced thursday the end , of cease - 94 (Score = 90 times) 

9) announced thursday the end , of cash - 47 (Score = 50 times) 

10) announced thursday the termination , of cease and desist - 47 (Score = 65 times) 

11) announced thursday the end , of cease fire - 47 (Score = 70 times) 

12) announced thursday the completion , of cease project - 47 (Score = 70 times) 

13) announced thursday the end of , its unilateral cease fire - 47 (Score = 55 times) 

14) announced thursday the end , of the cease fire which - 47 (Score = 60 times) 

15) announced thursday the end of , its annual - 46 (Score = 55 times) 

16) thursday announced that by the end , of thursday - 45 (Score = 40 times) 

17) announced thursday the , end of - 44 (Score = 70 times) 

18) announced on thursday the end , of the - 24 (Score = 60 times) 

19) announced on thursday the termination , of cease - 21 (Score = 85 times) 

20) e announced thursday the end , of the - 20 (Score = 45 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin' , Jueves el fin de su cese 1 (2,hamas 
anuncio este jueves el fin de su cese,3) — (998) 
Got an overlap in source, checking target 
1500 998 

Overlap check for 'hamas anuncio este jueves el fin' , 'jueves el fin de su cese' took 1.531 
*** hamas anuncio este jueves el fin (1500), (998)jueves el fin de su cese = 
hamas anuncio este jueves el fin de su cese 
@@@ 1581 



Overlappp results for hamas anuncio este jueves el fin de su cese 



1) 'hamas announced thursday the end , of cease' - 110 (Repeated 2 times) (hamas 
announced , thursday the end of: Thursday the end of cease) 

2) 'hamas announced thursday the termination , of cease' - 110 (Repeated 3 times) 
(hamas announced , thursday the termination of: Thursday the termination of cease) 

3) 'hamas announced thursday the completion , of cease' - 1 10 (Repeated 2 times) 
(hamas announced thursday , the completion of: Thursday the completion of cease) 
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4) 'hamas announced on thursday the termination , of cease' - 105 (Repeated 3 times) 
(hamas announced on thursday , the termination of: Thursday the termination of cease) 

5) 'hamas announced thursday the end , of the cease' - 105 (Repeated 3 times) (hamas 
announced , thursday the end of: Thursday the end of the cease) 

6) 'hamas announced on thursday the completion , of cease' - 105 (Repeated 2 times) 
(hamas announced on thursday , the completion of: Thursday the completion of cease) 

7) 'hamas announced on thursday the end , of cease' - 105 (Repeated 2 times) (hamas 
announced on thursday the , end of::thursday the end of cease) 

8) 'hamas announced thursday the termination , of cease and' - 105 (Repeated 2 
times) (hamas announced , thursday the termination of: Thursday the termination of cease 
and) 

9) 'hamas announced on thursday completion , of cease' - 100 (Repeated 2 times) 
(hamas announced on , thursday completion of: Thursday completion of cease) 

10) 'hamas announced on thursday the end , of the cease' - 100 (Repeated 3 times) 
(hamas announced on thursday the , end of::thursday the end of the cease) 

1 1) 'hamas announced on thursday the termination , of cease and' - 100 (Repeated 2 
times) (hamas announced on thursday , the termination of: Thursday the termination of 
cease and) 

12) 'hamas announced thursday the end of, its unilateral cease' - 95 (Repeated 2 
times) (hamas announced thursday ? the end of its: Thursday the end of its unilateral 
cease) 

13) 'hamas announced on thursday the end , of its unilateral cease' - 90 (Repeated 2 
times) (hamas announced on thursday the , end of: Thursday the end of its unilateral 
cease) 

14) 'hamas announced on thursday the end of , its unilateral cease' - 90 (Repeated 2 
times) (null) 

15) 'hamas announced thursday the end , of cease fire' - 90 (Repeated 1 times) (hamas 
announced , thursday the end of: Thursday the end of cease fire) 

16) 'announced thursday the termination , of cease' - 90 (Repeated 141 times) 
(announced thursday , the termination of: Thursday the termination of cease) 

17) 'hamas announced thursday the completion , of cease project' - 90 (Repeated 1 
times) (hamas announced thursday , the completion of: Thursday the completion of cease 
project) 

18) 'hamas announced thursday the successful completion , of cease' - 90 (Repeated 1 
times) (hamas announced thursday , the successful completion of: Thursday the successful 
completion of cease) 

19) 'hamas announced thursday the , end of - 90 (Repeated 1 times) (hamas 
announced , thursday the end: Thursday the end of) 

20) 'announced thursday the completion , of cease' - 90 (Repeated 94 times) 
(announced thursday , the completion of: Thursday the completion of cease) 



Sorted by repetition 



1) announced thursday the , end of - 21 1 (Score = 70 times) 

2) announced thursday the end , of the - 188 (Score = 65 times) 

3) announced thursday the termination , of cease - 141 (Score = 90 times) 
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4) announced thursday the end , of the cease - 141 (Score = 85 times) 

5) announced thursday the end of, its unilateral cease - 94 (Score = 75 times) 

6) announced thursday the termination , of cease and - 94 (Score = 85 times) 

7) announced thursday the completion , of cease - 94 (Score = 90 times) 

8) announced thursday the end , of cease - 94 (Score = 90 times) 

9) announced thursday the end , of the cease fire - 94 (Score = 65 times) 

10) announced thursday the end of , its unilateral cease fire - 47 (Score = 55 times) 

1 1) announced thursday the termination , of cease and desist - 47 (Score = 65 times) 

12) announced thursday the end , of the cease fire which - 47 (Score = 60 times) 

13) announced thursday the end , of cease fire - 47 (Score = 70 times) 

14) announced thursday the completion , of cease project - 47 (Score = 70 times) 

15) announced thursday the end of, its annual - 46 (Score = 55 times) 

16) announced thursday the end , of cash - 29 (Score = 50 times) 

17) announced on thursday the end , of the - 24 (Score = 60 times) 

18) e announced thursday the , end of - 22 (Score = 50 times) 

19) announced on thursday the termination , of cease - 21 (Score = 85 times) 

20) e announced thursday the end , of the - 20 (Score = 45 times) 
@@@ Pre 2 @@@ 

@@@Post2@@@ 

Trying to overlap 'hamas anuncid este jueves el f , f jueves el fin de su cese 1 (2,hamas 
anuncio este jueves el fin de su cese,3) — (998) 
Got an overlap in source, checking target 
1000-998 

Overlap check for 'hamas anuncio este jueves el' , jueves el fin de su cese 1 took 1.348 
*** hamas anuncio este jueves el (1000), (998)jueves el fin de su cese = hamas 
anuncio este jueves el fin de su cese 
@@@ 1512 ->0 

Overlappp results for hamas anuncio este jueves el fin de su cese 



1) 'hamas announced thursday the end , of cease' - 1 10 (Repeated 2 times) (null) 

2) 'hamas announced thursday the termination , of cease' - 110 (Repeated 3 times) 
(null) 

3) 'hamas announced thursday the completion , of cease' - 1 10 (Repeated 2 times) 
(null) 

4) 'hamas announced on thursday the termination , of cease' - 105 (Repeated 3 times) 
(null) 

5) 'hamas announced thursday the end , of the cease' - 105 (Repeated 3 times) (null) 

6) 'hamas announced on thursday the completion , of cease' - 105 (Repeated 2 times) 
(null) 

7) 'hamas announced on thursday the end , of cease' - 105 (Repeated 2 times) (null) 

8) 'hamas announced thursday the termination , of cease and' - 105 (Repeated 2 
times) (null) 

9) 'hamas announced on thursday completion , of cease' - 100 (Repeated 2 times) 
(null) 

10) 'hamas announced on thursday the end , of the cease' - 100 (Repeated 3 times) 
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(null) 

11) 'hamas announced on thursday the termination , of cease and' - 100 (Repeated 2 
times) (null) 

12) 'hamas announced thursday the end of , its unilateral cease 1 - 95 (Repeated 2 
times) (null) 

13) 'hamas announced on thursday the end , of its unilateral cease' - 90 (Repeated 2 
times) (null) 

14) 'hamas announced on thursday the end of , its unilateral cease' - 90 (Repeated 2 
times) (null) 

15) 'hamas announced thursday the end , of cease fire' - 90 (Repeated 1 times) (null) 

16) 'announced thursday the termination , of cease' - 90 (Repeated 141 times) (null) 

17) 'hamas announced thursday the completion , of cease project' - 90 (Repeated 1 
times) (null) 

18) 'hamas announced thursday the successful completion , of cease' - 90 (Repeated 1 
times) (null) 

19) 'hamas announced thursday the , end of - 90 (Repeated 1 times) (null) 

20) 'announced thursday the completion , of cease' - 90 (Repeated 94 times) (null) 



Sorted by repetition 



1) announced thursday the , end of - 207 (Score = 70 times) 

2) announced thursday the end , of the - 188 (Score = 65 times) 

3) announced thursday the termination , of cease - 141 (Score = 90 times) 

4) announced thursday the end , of the cease - 141 (Score = 85 times) 

5) announced thursday the end of , its unilateral cease - 94 (Score = 75 times) 

6) announced thursday the termination , of cease and - 94 (Score = 85 times) 

7) announced thursday the completion , of cease - 94 (Score = 90 times) 

8) announced thursday the end , of cease - 94 (Score = 90 times) 

9) announced thursday the end , of the cease fire - 94 (Score = 65 times) 

10) announced thursday the end of, its unilateral cease fire - 47 (Score = 55 times) 

11) announced thursday the termination , of cease and desist - 47 (Score = 65 times) 

12) announced thursday the end , of the cease fire which - 47 (Score = 60 times) 

13) announced thursday the end , of cease fire - 47 (Score = 70 times) 

14) announced thursday the completion , of cease project - 47 (Score = 70 times) 

15) announced thursday the end of, its annual - 46 (Score = 55 times) 

16) announced on thursday the end , of the - 24 (Score = 60 times) 

17) e announced thursday the , end of - 22 (Score = 50 times) 

18) announced thursday the end , of cash - 22 (Score = 50 times) 

19) announced on thursday the termination , of cease - 21 (Score = 85 times) 

20) e announced thursday the end , of the - 20 (Score = 45 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap hamas anuncio este jueves' , 'jueves el fin de su cese 1 (2,null,3) - 
(998) 

No good source overlap 

Skipping el fin de su (2 < 1) 
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Skipping el fin de su cese (2 < 2) 
Skipping el fin de su cese del (2 < 2) 
Skipping fin de su cese (2 < 2) 
Skipping fin de su cese del (2 < 2) 

x fin de su cese del fuego was just translated and returned results 
Number of results = 999 

Translation for fin de su cese del fuego took 1.246 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de 9 , 'fin de su cese del fuego* 
(2,hamas anuncio este jueves el fin de su cese del fuego,5) — (999) 
Got an overlap in source, checking target 
1500-999 

Overlap check for 'hamas anuncio este jueves el fin de' , 'fin de su cese del fuego 1 took 
2.114 

*** hamas anuncio este jueves el fin de (1500), (999)fin de su cese del fuego = 

hamas anuncio este jueves el fin de su cese del fuego 

###218->218 



Overlap results for hamas anuncio este jueves el fin de su cese del fuego 



1) hamas announced thursday the end of, its unilateral cease fire - 115 (Repeated 1 
times) (hamas announced thursday the end , of its::end of its unilateral cease fire) 

2) hamas announced on thursday the end of , its unilateral cease fire - 1 10 (Repeated 
1 times) (hamas announced on thursday the end , of its::end of its unilateral cease fire) 

3) thursday announced the end of , the cease fire - 105 (Repeated 20 times) (thursday 
announced , the end of the::the end of the cease fire) 

4) which thursday announced the end of , the cease fire - 100 (Repeated 4 times) 
(which thursday announced the end , of the::the end of the cease fire) 

5) on thursday announced the end of, the cease fire - 100 (Repeated 4 times) (on 
thursday announced , the end of the::the end of the cease fire) 

6) thursday announced the end of , the cease fire which - 100 (Repeated 15 times) 
(thursday announced the end , of the:: end of the cease fire which) 

7) thursday announced the end of , its unilateral cease fire - 95 (Repeated 4 times) 
(thursday announced the end , of its::end of its unilateral cease fire) 

8) hamas announced thursday the end of, its unilateral cease - 95 (Repeated 2 times) 
(hamas announced thursday the end , of its::end of its unilateral cease) 

9) announced thursday the end of , its unilateral cease fire - 95 (Repeated 46 times) 
(announced thursday the end , of its::end of its unilateral cease fire) 

10) which thursday announced the end of, the cease fire which - 95 (Repeated 3 
times) (which thursday announced the end , of the::end of the cease fire which) 

1 1) on thursday announced the end of, the cease fire which - 95 (Repeated 3 times) 
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(on thursday announced the end , of the:: end q£the cease fire which) 

12) thursday announced the end of, his light - 95 (Repeated 6 times) (thursday 
announced the end , of his::the end of his light) 

13) which thursday announced the end of, its unilateral cease fire - 90 (Repeated 1 
times) (which thursday announced the end , of its::end of its unilateral cease fire) 

14) on thursday announced the end of , its unilateral cease fire - 90 (Repeated 1 

times) (on thursday announced the end , of its::end of its unilateral cease fire) 

15) on thursday announced the end of , his light - 90 (Repeated 2 times) (on thursday 
announced the end , of his:: the end of his light) 

16) they announced thursday the end of , its unilateral cease fire - 90 (Repeated 1 

times) (they announced thursday the end , of its::end of its unilateral cease fire) 

17) and announced thursday the end of , its unilateral cease fire - 90 (Repeated 1 
times) (and announced thursday the end , of its::end of its unilateral cease fire) 

18) were announced thursday the end of , its unilateral cease fire - 90 (Repeated 1 
times) (were announced thursday the end , of its::end of its unilateral cease fire) 

19) was announced thursday the end of, its unilateral cease fire - 90 (Repeated 1 
times) (was announced thursday the end , of its::end of its unilateral cease fire) 

20) be announced thursday the end of , its unilateral cease fire - 90 (Repeated 1 
times) (be announced thursday the end , of its::end of its unilateral cease fire) 



Sorted by repetition 



1) announced thursday the end of , its unilateral cease - 92 (Score = 75 times) 

2) announced thursday the end of, its unilateral cease fire - 46 (Score = 95 times) 

3) thursday announced the end of, the fire - 40 (Score = 85 times) 

4) thursday announced the end of, the cease - 25 (Score = 85 times) 

5) thursday announced the end of , the cease fire - 20 (Score = 105 times) 

6) thursday announced the end of , the fire and - 15 (Score = 80 times) 

7) thursday announced the end of , the unconditional cease fire - 15 (Score = 85 
times) 

8) thursday announced the end of, the cease fire which - 15 (Score = 100 times) 

9) thursday announced the end of, a 14-month cease - 10 (Score = 65 times) 

10) thursday announced the end of , the unconditional cease fire that - 10 (Score = 80 
times) 

11) thursday announced the end of , the fire his - 10 (Score = 90 times) 

12) thursday announced the end of , the cease fire which ended - 10 (Score = 80 
times) 

13) thursday announced the end of , the fire and his - 10 (Score = 85 times) 

14) announced on thursday the end of, its unilateral cease - 10 (Score = 70 times) 

15) e announced thursday the end of, its unilateral cease - 10 (Score = 55 times) 

16) thursday announced the end of , the hearth - 10 (Score = 85 times) 

17) thursday announced the end of , its unilateral cease - 8 (Score = 75 times) 

18) on thursday announced the end of , the fire - 8 (Score = 80 times) 

19) officials thursday announced the end of , the fire - 8 (Score = 65 times) 

20) which thursday announced the end of , the fire - 8 (Score = 80 times) 
@@@Pre2@@@ 
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@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin 1 , 'fin de su cese del fuego 1 

(2,null,5) - (999) 

No good source overlap 

@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el f , f fin de su cese del fuego* (2,null,5) — 
(999) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese 1 , 'fin de su cese del 
fuego' (2,hamas anuncio este jueves el fin de su cese del fuego,5) — (999) 
Got an overlap in source, checking target 
1500-999 

Overlap check for 'hamas anuncio este jueves el fin de su cese' , 'fin de su cese del fuego' 
took 2.737 

*** hamas anuncio este jueves el fin de su cese (1500), (999)fin de su cese del 
fuego = hamas anuncio este jueves el fin de su cese del fuego 
@@@ 3369 -> 0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego 



1) 'hamas announced thursday the end of, cease fire' - 130 (Repeated 1 times) (hamas 
announced thursday the end , of cease: rend of cease fire) 

2) f hamas announced thursday the end of cease , fire the 1 - 125 (Repeated 2 times) 
(hamas announced thursday the end , of cease fire::of cease fire the) 

3) 'hamas announced thursday the end of the , cease fire' - 125 (Repeated 1 times) 
(hamas announced thursday the end , of the cease: :the end of the cease fire) 

4) 'hamas announced thursday the end of cease , fire it' - 125 (Repeated 2 times) 
(hamas announced thursday the end , of cease fire: :of cease fire it) 

5) 'hamas announced thursday the end of cease , fire by' - 125 (Repeated 3 times) 
(hamas announced thursday the end , of cease fire::of cease fire by) 

6) 'hamas announced thursday the end of cease , fire in' - 125 (Repeated 3 times) 
(hamas announced thursday the end , of cease fire::of cease fire in) 

7) 'hamas announced thursday the end of cease , fire was' - 125 (Repeated 2 times) 
(hamas announced thursday the end , of cease fire::of cease fire was) 

8) 'hamas announced on thursday the end of, cease fire' - 125 (Repeated 1 times) 
(hamas announced on thursday the end , of cease: :end of cease fire) 

9) 'hamas announced thursday the end of cease , fire or' - 125 (Repeated 2 times) 
(hamas announced thursday the end , of cease fire::of cease fire or) 

10) 'hamas announced thursday the end of cease , fire and' - 125 (Repeated 1 times) 
(hamas announced thursday the end , of cease fire::of cease fire and) 

11) 'hamas announced thursday the end of cease , fire is' - 125 (Repeated 2 times) 
(hamas announced thursday the end , of cease fire::of cease fire is) 

12) 'hamas announced thursday the end of cease , fire for' - 125 (Repeated 1 times) 
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(hamas announced thursday the end , of cease fire: :of cease fire for) 

13) 'hamas announced on thursday the end of cease , fire by' - 120 (Repeated 3 times) 
(hamas announced on thursday the end , of cease fire::of cease fire by) 

14) 'hamas announced on thursday the end of cease , fire the' - 120 (Repeated 2 
times) (hamas announced on thursday the end , of cease fire::of cease fire the) 

15) 'hamas announced thursday the end of cease , fire by the' - 120 (Repeated 1 
times) (hamas announced thursday the end , of cease fire: :of cease fire by the) 

16) 'hamas announced on thursday the end of cease , fire is' - 120 (Repeated 2 times) 
(hamas announced on thursday the end , of cease fire: :of cease fire is) 

17) 'hamas announced on thursday the end of cease , fire and' - 120 (Repeated 1 
times) (hamas announced on thursday the end , of cease fire::of cease fire and) 

18) 'hamas announced thursday the end of cease , fire in the' - 120 (Repeated 1 times) 
(hamas announced thursday the end , of cease fire: :of cease fire in the) 

19) 'hamas announced thursday the end of cease , fire it has' - 120 (Repeated 1 times) 
(hamas announced thursday the end , of cease fire::of cease fire it has) 

20) 'hamas announced on thursday the end of cease , fire in' - 120 (Repeated 3 times) 
(hamas announced on thursday the end , of cease fire: :of cease fire in) 



Sorted by repetition 



1) announced thursday the end of cease , fire in - 101 (Score = 105 times) 

2) announced thursday the end of cease , fire by - 101 (Score = 105 times) 

3) announced thursday the end of cease , fire it - 94 (Score = 105 times) 

4) announced thursday the end of cease , fire or - 94 (Score = 105 times) 

5) announced thursday the end of cease , fire was - 94 (Score = 105 times) 

6) announced thursday the end of the cease , fire at - 74 (Score = 100 times) 

7) announced thursday the end of cease , fire the - 54 (Score = 105 times) 

8) announced thursday the end of cease , fire is - 54 (Score = 105 times) 

9) announced thursday the end of the cease , fire to - 47 (Score = 100 times) 

10) announced thursday the end of cease , fire and - 47 (Score = 105 times) 

11) announced thursday the end of , cease fire - 47 (Score = 110 times) 

12) announced thursday the end of cease , fire in the - 47 (Score = 100 times) 

13) announced thursday the end of cease , fire for - 47 (Score = 105 times) 

14) announced thursday the end of the cease , fire which - 47 (Score = 100 times) 

15) announced thursday the end of cease , fire by the - 47 (Score = 100 times) 

16) announced thursday the end of cease , fire was the - 47 (Score = 100 times) 

17) announced thursday the end of cease , fire or what - 47 (Score = 100 times) 

18) announced thursday the end of the , cease fire - 47 (Score = 105 times) 

19) announced thursday the end of cease , fire it has - 47 (Score = 100 times) 

20) announced thursday the end of its unilateral , cease fire - 30 (Score = 95 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves' , ! fin de su cese del fuego' (2,null,5) — 
(999) 

No good source overlap 

Skipping de su cese del (2 < 1) 
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Skipping de su cese del fuego (2 < 2) 

x de su cese del fuego con was just translated and returned results 
Number of results = 1000 

Translation for de su cese del fuego con took 1.176 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap hamas anuncio este jueves el fin de f , ! de su cese del fuego con 1 

(2,null,6)--(1000) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin f , 'de su cese del fuego con 1 

(2,null,6)~(1000) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el f , 'de su cese del fuego con 1 (2,null,6) — 
(1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , 'de su cese 
del fuego con' (2 5 hamas anuncio este jueves el fin de su cese del fuego con,6) - (1000) 
Got an overlap in source, checking target 
1500-1000 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego 1 , 'de su cese del 
fuego con' took 6.308 

*** hamas anuncio este jueves el fin de su cese del fuego (1500), (1000)de su 
cese del fuego con = hamas anuncio este jueves el fin de su cese del fuego con 
### 16233 -> 16233 



Overlap results for hamas anuncio este jueves el fin de su cese del fuego con 



1) hamas announced thursday the end of cease , fire with their - 140 (Repeated 4 
times) (hamas announced thursday the end of, cease fire::cease fire with their) 

2) hamas announced thursday the end of cease , fire with - 135 (Repeated 21 times) 
(hamas announced thursday the end of, cease fire::of cease fire with) 

3) hamas announced on thursday the end of cease , fire with their - 135 (Repeated 4 
times) (hamas announced on thursday the end of, cease fire::cease fire with their) 

4) announced thursday the end of cease , fire with hamas - 135 (Repeated 94 times) 
(announced thursday the end of, cease fire::cease fire with hamas) 

5) hamas announced thursday the end of the cease , fire with their - 135 (Repeated 4 
times) (hamas announced thursday the end of the , cease fire::the cease fire with their) 

6) be announced thursday the end of cease , fire with hamas - 130 (Repeated 2 times) 
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(be announced thursday the end of, cease fire::cease fire with hamas) 

7) hamas announced on thursday the end of cease , fire with - 130 (Repeated 21 
times) (hamas announced on thursday the end of, cease fire: :of cease fire with) 

8) announced thursday the end of cease , fire with hamas and - 130 (Repeated 47 
times) (announced thursday the end of, cease fire::cease fire with hamas and) 

9) and announced thursday the end of cease , fire with hamas - 130 (Repeated 4 
times) (and announced thursday the end of, cease fire::cease fire with hamas) 

10) announced on thursday the end of cease , fire with hamas - 130 (Repeated 12 
times) (announced on thursday the end of , cease fire::cease fire with hamas) 

1 1) announced thursday the end of the cease , fire with hamas - 130 (Repeated 94 
times) (announced thursday the end of the , cease fire::cease fire with hamas) 

12) hamas announced thursday the end of the cease , fire with - 130 (Repeated 21 
times) (hamas announced thursday the end of the , cease fire::the cease fire with) 

13) hamas announced thursday the end of cease , fire with the - 130 (Repeated 13 
times) (hamas announced thursday the end of, cease fire::of cease fire with the) 

14) hamas announced on thursday the end of the cease , fire with their - 130 
(Repeated 4 times) (hamas announced on thursday the end of the , cease fire::the cease 
fire with their) 

15) they announced thursday the end of cease , fire with hamas - 130 (Repeated 2 
times) (they announced thursday the end of, cease fire:: cease fire with hamas) 

16) were announced thursday the end of cease , fire with hamas - 130 (Repeated 2 
times) (were announced thursday the end of, cease fire::cease fire with hamas) 

17) hamas announced thursday the end of cease , fire with them - 130 (Repeated 1 
times) (hamas announced thursday the end of, cease fire::cease fire with them) 

18) was announced thursday the end of cease , fire with hamas - 130 (Repeated 2 
times) (was announced thursday the end of, cease fire::cease fire with hamas) 

19) thursday announced the end of the cease fire , with hamas - 130 (Repeated 10 
times) (thursday announced the end of the cease , fire with::cease fire with hamas) 

20) hamas announced thursday the end of cease , fire as - 125 (Repeated 3 times) 
(hamas announced thursday the end of, cease fire::cease fire as) 



Sorted by repetition 



1) announced thursday the end of cease , fire with - 246 (Score = 115 times) 

2) announced thursday the end of the cease , fire with - 186 (Score = 110 times) 

3) announced thursday the end of cease , fire with hamas - 94 (Score = 135 times) 

4) announced thursday the end of cease , fire with the - 94 (Score = 110 times) 

5) announced thursday the end of the cease , fire with hamas - 94 (Score = 130 times) 

6) announced thursday the end of its unilateral cease , fire with - 86 (Score = 100 
times) 

7) announced thursday the end of the cease , fire with the - 74 (Score = 105 times) 

8) announced thursday the end of cease , fire with their - 64 (Score = 120 times) 

9) announced thursday the end of its unilateral cease , fire with hamas - 60 (Score = 
120 times) 

10) announced thursday the end of the cease , fire with their - 53 (Score = 115 times) 

11) announced thursday the end of the cease , fire a - 51 (Score = 100 times) 
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12) announced on thursday the end of cease , fire with - 51 (Score = 110 times) 

13) announced thursday the end of cease , fire a - 49 (Score = 105 times) 

14) announced on thursday the end of the cease , fire with - 47 (Score = 105 times) 

15) announced thursday the end of the cease , fire with hamas and - 47 (Score = 125 
times) 

16) announced thursday the end of cease , fire with hamas and - 47 (Score = 130 
times) 

17) announced on thursday the end of cease , fire a - 33 (Score = 100 times) 

18) announced on thursday the end of the cease , fire a - 32 (Score = 95 times) 

19) hamas announced thursday the end of the cease , fire a - 30 (Score = 120 times) 

20) announced thursday the end of its unilateral cease , fire with hamas and - 30 

(Score = 115 times) 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese f , 'de su cese del fuego 
con 1 (2,hamas anuncio este jueves el fin de su cese del fuego con,6) ~ (1000) 
Got an overlap in source, checking target 
1500 - 1000 

Overlap check for 'hamas anuncio este jueves el fin de su cese' , f de su cese del fuego con' 
took 3.087 

*** hamas anuncio este jueves el fin de su cese (1500), (1000)de su cese del 
fuego con = hamas anuncio este jueves el fin de su cese del fuego con 
@@@ 17704 ->0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con 



1) 'hamas announced thursday the end of cease , fire with their 1 - 140 (Repeated 4 
times) (null) 

2) 'hamas announced thursday the end of cease , fire with' - 135 (Repeated 21 times) 
(hamas announced thursday the end , of cease fire::of cease fire with) 

3) 'hamas announced on thursday the end of cease , fire with their' - 135 (Repeated 4 
times) (null) 

4) 'announced thursday the end of cease , fire with hamas' - 135 (Repeated 94 times) 
(null) 

5) 'hamas announced thursday the end of the cease , fire with their' - 135 (Repeated 
4 times) (null) 

6) 'be announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (null) 

7) 'hamas announced on thursday the end of cease , fire with' - 130 (Repeated 21 
times) (hamas announced on thursday the end , of cease fire::of cease fire with) 

8) 'announced thursday the end of cease , fire with hamas and' - 130 (Repeated 47 
times) (null) 

9) 'and announced thursday the end of cease , fire with hamas' - 130 (Repeated 4 
times) (null) 

10) 'announced on thursday the end of cease , fire with hamas' - 130 (Repeated 12 
times) (null) 
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1 1) 'announced thursday the end of the cease , fire with hamas' - 130 (Repeated 94 
times) (null) 

12) 'hamas announced thursday the end of the cease , fire with* - 130 (Repeated 21 
times) (null) 

13) 'hamas announced thursday the end of cease , fire with the' - 130 (Repeated 13 

times) (hamas announced thursday the end , of cease fire::of cease fire with the) 

14) 'hamas announced on thursday the end of the cease , fire with their' - 130 
(Repeated 4 times) (null) 

15) 'they announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (null) 

16) 'were announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (null) 

17) 'hamas announced thursday the end of cease , fire with them' - 130 (Repeated 1 
times) (null) 

18) 'was announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (null) 

19) 'thursday announced the end of the cease fire , with hamas' - 130 (Repeated 10 
times) (null) 

20) 'hamas announced thursday the end of cease , fire as' - 125 (Repeated 3 times) 
(null) 



Sorted by repetition 



1) announced thursday the end of cease , fire with - 229 (Score = 115 times) 

2) announced thursday the end of the cease , fire with - 172 (Score = 110 times) 

3) announced thursday the end of cease , fire with hamas - 94 (Score = 135 times) 

4) announced thursday the end of the cease , fire with hamas - 94 (Score = 130 times) 

5) announced thursday the end of cease , fire with the - 83 (Score = 110 times) 

6) announced thursday the end of its unilateral cease , fire with - 80 (Score = 100 
times) 

7) announced thursday the end of the cease , fire with the - 66 (Score = 105 times) 

8) announced thursday the end of cease , fire with their - 62 (Score = 120 times) 

9) announced thursday the end of its unilateral cease , fire with hamas - 58 (Score = 
120 times) 

10) announced thursday the end of cease , fire a - 49 (Score = 105 times) 

11) announced on thursday the end of cease , fire with - 49 (Score =110 times) 

12) announced thursday the end of the cease , fire a - 47 (Score = 100 times) 

13) announced on thursday the end of the cease , fire with - 47 (Score = 105 times) 

14) announced thursday the end of the cease , fire with hamas and - 47 (Score = 125 
times) 

15) announced thursday the end of cease , fire with hamas and - 47 (Score = 130 
times) 

16) announced thursday the end of the cease , fire with their - 45 (Score = 115 times) 

17) announced on thursday the end of cease , fire a - 33 (Score = 100 times) 

18) announced on thursday the end of the cease , fire a - 32 (Score = 95 times) 

19) hamas announced thursday the end of the cease , fire a - 30 (Score = 120 times) 
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20) ha mas announced on thursday the end of the cease , fire a - 29 (Score = 115 
times) 

@@@ Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap hamas anuncio este jueves' , 'de su cese del fuego con 1 (2,null,6) 
(1000) 

No good source overlap 

Skipping su cese del fuego (2 < 2) 

x su cese del fuego con was just translated and returned results 

Number of results = 1000 

Translation for su cese del fuego con took 0.949 

going to try and overlap this piece with the hashmap 

@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de' , 'su cese del fuego con 1 

(2,null,7) - (1000) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post 2 @@@ 

Trying to overlap 'hamas anuncio este jueves el fin' , f su cese del fuego con 1 (2,null,7) - 
- (1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el' , f su cese del fuego con 1 (2,null,7) — 
(1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , 'su cese del 
fuego con' (2,hamas anuncio este jueves el fin de su cese del fuego con,7) — (1000) 
Got an overlap in source, checking target 
1500 - 1000 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego' , 'su cese del 
fuego con' took 7.002 

*** hamas anuncio este jueves el fin de su cese del fuego (1500), (1000)su cese 
del fuego con - hamas anuncio este jueves el fin de su cese del fuego con 
@@@ 19781 ->0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con 



1) 'hamas announced thursday the end of cease , fire with their' - 140 (Repeated 4 
times) (hamas announced thursday the end of , cease fire::cease fire with their) 

2) 'hamas announced thursday the end of cease , fire with' - 135 (Repeated 21 times) 
(hamas announced thursday the end of, cease fire: :of cease fire with) 
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3) 'hamas announced on thursday the end of cease , fire with their' - 135 (Repeated 4 

times) (hamas announced on thursday the end of, cease fire::cease fire with their) 

4) 'hamas announced thursday the end of cease , fire his' - 135 (Repeated 3 times) 
(hamas announced thursday the end of, cease fire::cease fire his) 

5) 'announced thursday the end of cease , fire with hamas' - 135 (Repeated 94 times) 
(announced thursday the end of, cease fire::cease fire with hamas) 

6) 'hamas announced thursday the end of the cease , fire with their' - 135 (Repeated 
4 times) (hamas announced thursday the end of the , cease fire::the cease fire with their) 

7) 'be announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (be announced thursday the end of, cease fire::cease fire with hamas) 

8) 'hamas announced on thursday the end of cease , fire with' - 130 (Repeated 21 
times) (hamas announced on thursday the end of, cease fire::of cease fire with) 

9) 'announced thursday the end of cease , fire with hamas and' - 130 (Repeated 47 
times) (announced thursday the end of, cease fire::cease fire with hamas and) 

10) 'and announced thursday the end of cease , fire with hamas' - 130 (Repeated 4 
times) (and announced thursday the end of , cease fire::cease fire with hamas) 

1 1) 'hamas announced thursday the end of cease fire , in their' - 130 (Repeated 3 
times) (hamas announced thursday the end of cease , fire in::cease fire in their) 

12) 'hamas announced thursday the end of cease , fire to his' - 130 (Repeated 2 times) 
(hamas announced thursday the end of, cease fire::cease fire to his) 

13) 'announced on thursday the end of cease , fire with hamas' - 130 (Repeated 12 
times) (announced on thursday the end of, cease fire::cease fire with hamas) 

14) 'announced thursday the end of the cease , fire with hamas' - 130 (Repeated 94 
times) (announced thursday the end of the , cease fire::cease fire with hamas) 

15) 'hamas announced thursday the end of cease , fire had his' - 130 (Repeated 2 
times) (hamas announced thursday the end of, cease fire::cease fire had his) 

16) 'hamas announced thursday the end of the cease , fire with' - 130 (Repeated 21 
times) (hamas announced thursday the end of the , cease fire::the cease fire with) 

17) 'hamas announced thursday the end of cease , fire on their' - 130 (Repeated 2 
times) (hamas announced thursday the end of, cease fire::cease fire on their) 

18) 'hamas announced thursday the end of cease fire , for their' - 130 (Repeated 2 
times) (hamas announced thursday the end of cease , fire forrxease fire for their) 

19) 'hamas announced thursday the end of cease , fire with the' - 130 (Repeated 13 
times) (hamas announced thursday the end of, cease fire::of cease fire with the) 

20) 'hamas announced thursday the end of cease fire , in his' - 130 (Repeated 2 times) 
(hamas announced thursday the end of cease , fire in::cease fire in his) 



Sorted by repetition 



1) announced thursday the end of cease , fire with - 178 (Score = 115 times) 

2) announced thursday the end of the cease , fire with - 136 (Score = 110 times) 

3) announced thursday the end of the cease , fire with hamas - 94 (Score = 130 times) 

4) announced thursday the end of cease , fire with hamas - 94 (Score = 135 times) 

5) announced thursday the end of cease , fire with the - 72 (Score = 110 times) 

6) announced thursday the end of cease , fire with their - 51 (Score = 120 times) 

7) announced thursday the end of the cease , fire a - 50 (Score = 100 times) 
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8) announced thursday the end of cease , fire a - 48 (Score = 105 times) 

9) announced thursday the end of cease , fire with hamas and - 47 (Score = 130 
times) 

10) announced thursday the end of the cease , fire with hamas and - 47 (Score = 125 
times) 

1 1) hamas announced thursday the end of the cease , fire a - 47 (Score = 120 times) 

12) announced on thursday the end of cease , fire with - 47 (Score = 110 times) 

13) announced thursday the end of its unilateral cease , fire with - 45 (Score = 100 
times) 

14) announced on thursday the end of the cease , fire with - 39 (Score = 105 times) 

15) announced thursday the end of its unilateral cease , fire with hamas - 36 (Score = 
120 times) 

16) announced on thursday the end of cease , fire a - 30 (Score = 100 times) 

17) announced thursday the end of the cease , fire with the - 30 (Score = 105 times) 

18) hamas announced thursday the end of cease , fire a - 29 (Score = 125 times) 

19) hamas announced on thursday the end of cease , fire a - 27 (Score = 120 times) 

20) hamas announced on thursday the end of the cease , fire a - 26 (Score = 115 
times) 

@@@Pre2@@@ 
@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con f , 'su cese 
del fuego con' (2,null,7) - (1000) 
No good source overlap 
@@@Pre2@@@ t 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese' , f su cese del fuego con 1 

(2,hamas anuncio este jueves el fin de su cese del fuego con, 7) — (1000) 
Got an overlap in source, checking target 
1500 - 1000 

Overlap check for 'hamas anuncio este jueves el fin de su cese' , 'su cese del fuego con' 
took 2.612 

*** hamas anuncio este jueves el fin de su cese (1500), (1000)su cese del fuego 
con = hamas anuncio este jueves el fin de su cese del fiiego con 
@@@ 2475 -> 0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con 



1) 'hamas announced thursday the end of cease , fire with their' - 140 (Repeated 4 
times) (null) 

2) 'hamas announced thursday the end of cease , fire with' - 135 (Repeated 21 times) 
(hamas announced thursday the end , of cease fire: :of cease fire with) 

3) 'hamas announced on thursday the end of cease , fire with their' - 135 (Repeated 4 
times) (null) 

4) 'hamas announced thursday the end of cease , fire his' - 135 (Repeated 3 times) 
(null) 

5) 'announced thursday the end of cease , fire with hamas' - 135 (Repeated 94 times) 
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(null) 

6) 'hamas announced thursday the end of the cease , fire with their 1 - 135 (Repeated 
4 times) (null) 

7) f be announced thursday the end of cease , fire with hamas' - 130 (Repeated 2 
times) (null) 

8) 'hamas announced on thursday the end of cease , fire with* - 130 (Repeated 21 
times) (hamas announced on thursday the end , of cease fire::of cease fire with) 

9) 'announced thursday the end of cease , fire with hamas and' - 130 (Repeated 47 
times) (null) 

10) 'and announced thursday the end of cease , fire with hamas' - 130 (Repeated 4 
times) (null) 

1 1) 'hamas announced thursday the end of cease fire , in their' - 130 (Repeated 3 
times) (null) 

12) 'hamas announced thursday the end of cease , fire to his' - 130 (Repeated 2 times) 
(null) 

13) 'announced on thursday the end of cease , fire with hamas' - 130 (Repeated 12 
times) (null) 

14) 'announced thursday the end of the cease , fire with hamas' - 130 (Repeated 94 
times) (null) 

15) 'hamas announced thursday the end of cease , fire had his' - 130 (Repeated 2 
times) (null) 

16) 'hamas announced thursday the end of the cease , fire with' - 130 (Repeated 21 
times) (null) 

17) 'hamas announced thursday the end of cease , fire on their' - 130 (Repeated 2 
times) (null) 

18) 'hamas announced thursday the end of cease fire , for their' - 130 (Repeated 2 
times) (null) 

19) 'hamas announced thursday the end of cease , fire with the' - 130 (Repeated 13 
times) (hamas announced thursday the end , of cease fire::of cease fire with the) 

20) 'hamas announced thursday the end of cease fire , in his' - 130 (Repeated 2 times) 
(null) 



Sorted by repetition 



1) announced thursday the end of cease , fire with - 178 (Score = 115 times) 

2) announced thursday the end of the cease , fire with - 136 (Score = 110 times) 

3) announced thursday the end of cease , fire with hamas - 94 (Score = 135 times) 

4) announced thursday the end of the cease , fire with hamas - 94 (Score = 130 times) 

5) announced thursday the end of cease , fire with the - 72 (Score = 110 times) 

6) announced thursday the end of cease , fire with their - 51 (Score = 120 times) 

7) announced thursday the end of the cease , fire a - 50 (Score = 100 times) 

8) announced thursday the end of cease , fire a - 48 (Score = 105 times) 

9) announced on thursday the end of cease , fire with - 47 (Score = 110 times) 

10) hamas announced thursday the end of the cease , fire a - 47 (Score = 120 times) 

11) announced thursday the end of the cease , fire with hamas and - 47 (Score = 125 
times) 
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12) announced thursday the end of cease , fire with hamas and - 47 (Score = 130 
times) 

13) announced thursday the end of its unilateral cease , fire with - 45 (Score = 100 
times) 

14) announced on thursday the end of the cease , fire with - 39 (Score = 105 times) 

15) announced thursday the end of its unilateral cease , fire with hamas - 36 (Score = 
120 times) 

16) announced thursday the end of the cease , fire with the - 30 (Score = 105 times) 

17) announced on thursday the end of cease , fire a - 30 (Score = 100 times) 

18) hamas announced thursday the end of cease , fire a - 29 (Score = 125 times) 

19) hamas announced on thursday the end of cease , fire a - 27 (Score = 120 times) 

20) hamas announced on thursday the end of the cease , fire a - 26 (Score = 115 
times) 

@@@ Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves' , 'su cese del fuego con f (2,null,7) - 
(1000) 

No good source overlap 
{} 

x su cese del fuego con israel was just translated and returned results 
Number of results = 631 

Translation for su cese del fuego con israel took 1.12 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de f , f su cese del fuego con israel 1 

(2,null,7)--(631) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin' , 'su cese del fuego con israel' 

(2,null,7)--(631) 
No good source overlap 
@@@ Pre 2 @@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el f , 'su cese del fuego con israel' 

(2,null,7)-(631) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , 'su cese del 
fuego con israel' (2,hamas anuncio este jueves el fin de su cese del fuego con israel,7) - 
(631) 

Got an overlap in source, checking target 
1500-631 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego' , 'su cese del 
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fuego con israel 1 took 7.102 

*** hamas anuncio este j ueves el fin de su cese del fuego (1500), (631)su cese 
del fuego con israel = hamas anuncio este jueves el fin de su cese del fuego con 
israel 

### 14957 -> 14957 



Overlap results for hamas anuncio este jueves el fin de su cese del fuego con israel 

1) hamas announced thursday the end of cease , fire with israel - 155 (Repeated 30 
times) (hamas announced thursday the end of, cease fire::cease fire with israel) 

2) hamas announced thursday the end of cease , fire israel - 150 (Repeated 10 times) 
(hamas announced thursday the end of, cease fire::cease fire israel) 

3) hamas announced on thursday the end of cease , fire with israel - 150 (Repeated 26 
times) (hamas announced on thursday the end of, cease fire::cease fire with israel) 

4) hamas announced thursday the end of cease , fire with israel was - 150 (Repeated 
1 times) (hamas announced thursday the end of, cease fire::cease fire with israel was) 

5) hamas announced thursday the end of cease fire , by israel with - 150 (Repeated 3 
times) (hamas announced thursday the end of cease , fire by::cease fire by israel with) 

6) hamas announced thursday the end of cease , fire with israel and - 150 (Repeated 
12 times) (hamas announced thursday the end of, cease fire::cease fire with israel and) 

7) hamas announced thursday the end of the cease , fire with israel - 150 (Repeated 
27 times) (hamas announced thursday the end of the , cease fire::the cease fire with 
israel) 

8) hamas announced thursday the end of cease , fire with israel the - 150 (Repeated 3 
times) (hamas announced thursday the end of, cease fire::cease fire with israel the) 

9) hamas announced thursday the end of cease fire , by israel - 145 (Repeated 4 
times) (hamas announced thursday the end of cease , fire by::cease fire by israel) 

10) hamas announced thursday the end of the cease , fire with israel the - 145 
(Repeated 3 times) (hamas announced thursday the end of the , cease fire::cease fire with 
israel the) 

1 1) hamas announced thursday the end of cease , fire israel is - 145 (Repeated 5 

times) (hamas announced thursday the end of, cease fire::cease fire israel is) 

12) hamas announced thursday the end of the cease , fire with israel and - 145 

(Repeated 9 times) (hamas announced thursday the end of the , cease fire::the cease fire 
with israel and) 

13) hamas announced on thursday the end of cease , fire with israel the - 145 

(Repeated 2 times) (hamas announced on thursday the end of, cease fire::cease fire with 
israel the) 

14) hamas announced thursday the end of cease fire , and israel - 145 (Repeated 5 

times) (hamas announced thursday the end of cease , fire and: :cease fire and israel) 

15) hamas announced on thursday the end of the cease , fire with israel - 145 

(Repeated 20 times) (hamas announced on thursday the end of the , cease fire:: the cease 
fire with israel) 

16) hamas announced on thursday the end of cease , fire with israel and - 145 

(Repeated 9 times) (hamas announced on thursday the end of, cease fire::cease fire with 
israel and) 
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17) hamas announced on thursday the end of cease , fire israel - 145 (Repeated 7 
times) (hamas announced on thursday the end of, cease fire::cease fire israel) 

18) hamas announced thursday the end of the cease , fire by israel with - 145 

(Repeated 3 times) (hamas announced thursday the end of the , cease fire::cease fire by 
israel with) 

19) hamas announced on thursday the end of cease fire , by israel with - 145 

(Repeated 3 times) (hamas announced on thursday the end of cease , fire by::cease fire by 
israel with) 

20) hamas announced thursday the end of the cease , fire with israel was - 145 

(Repeated 1 times) (hamas announced thursday the end of the , cease fire::cease fire with 
israel was) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 279 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 209 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 1 13 (Score = 130 times) 

4) announced thursday the end of cease fire , by israel - 91 (Score = 125 times) 

5) announced thursday the end of cease , fire with israel and - 85 (Score = 130 times) 

6) announced on thursday the end of cease , fire with israel - 65 (Score = 130 times) 

7) announced thursday the end of the cease , fire by israel - 53 (Score = 120 times) 

8) announced thursday the end of cease , fire with israel the - 53 (Score = 130 times) 

9) announced thursday the end of cease fire , by israel with - 52 (Score = 130 times) 

10) announced thursday the end of cease fire , and israel - 50 (Score = 125 times) 

1 1) announced thursday the end of cease , fire israel is - 50 (Score = 125 times) 

12) announced thursday the end of the cease , fire israel - 49 (Score = 125 times) 

13) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

14) announced thursday the end of the cease , fire with israel and - 46 (Score = 125 
times) 

15) announced thursday the end of the cease , fire by israel with - 46 (Score = 125 
times) 

16) announced thursday the end of the cease , fire with israel the - 43 (Score = 125 
times) 

17) announced thursday the end of its unilateral cease , fire with israel - 43 (Score = 
120 times) 

18) e announced thursday the end of cease , fire with israel - 39 (Score = 115 times) 

19) announced on thursday the end of the cease , fire with israel - 38 (Score = 125 
times) 

20) announced thursday the end of the cease , fire with israel was - 37 (Score = 125 
times) 

@@@ Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con' , f su cese 
del fuego con israel' (2,hamas anuncio este jueves el fin de su cese del fuego con 
israel,7)~(631) 



337 



Got an overlap in source, checking target 
1500-631 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego con' , 'su cese del 
fuego con israel' took 3371 

*** hamas anuncio este jueves el fin de su cese del fuego con (1500), (631)su 
cese del fuego con israel = hamas anuncio este jueves el fin de su cese del 
fuego con israel 
@@@ 16056 ->0 



Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 



1) 'hamas announced thursday the end of cease fire , with israel* - 155 (Repeated 1 
times) (hamas announced thursday the end of cease , fire with: xease fire with israel) 

2) 'hamas announced thursday the end of cease , fire with israel' - 155 (Repeated 27 
times) (null) 

3) 'hamas announced on thursday the end of cease fire , with israel' - 150 (Repeated 

1 times) (hamas announced on thursday the end of cease , fire with: xease fire with israel) 

4) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(null) 

5) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 
22 times) (null) 

6) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (null) 

7) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire by: xease fire by israel with) 

8) 'hamas announced thursday the end of cease , fire with israel and' - 150 (Repeated 

9 times) (null) 

9) 'hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 

10 times) (hamas announced thursday the end of cease , fire with israel: xease fire with 
israel and) 

10) 'hamas announced thursday the end of cease fire with , israel was' - 150 

(Repeated 1 times) (hamas announced thursday the end of cease , fire with israel: xease 
fire with israel was) 

11) 'hamas announced thursday the end of the cease , fire with israel' - 150 

(Repeated 23 times) (null) 

12) 'hamas announced thursday the end of the cease fire , with israel' - 150 

(Repeated 1 times) (hamas announced thursday the end of the cease , fire with::the cease 
fire with israel) 

13) 'hamas announced thursday the end of cease fire with , israel the' - 150 

(Repeated 3 times) (hamas announced thursday the end of cease , fire with israel: xease 
fire with israel the) 

14) 'hamas announced thursday the end of cease , fire with israel the' - 150 

(Repeated 3 times) (null) 

15) 'hamas announced thursday the end of the cease fire with , israel the' - 145 

(Repeated 2 times) (hamas announced thursday the end of the cease , fire with 
israel: xease fire with israel the) 
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16) 'hamas announced thursday the end of the cease fire , by israel with' - 145 

(Repeated 2 times) (hamas announced thursday the end of the cease , fire by::cease fire 
by israel with) 

17) 'hamas announced on thursday the end of cease fire with , israel was 1 - 145 

(Repeated 1 times) (hamas announced on thursday the end of cease , fire with 
israel: xease fire with israel was) 

18) 'hamas announced on thursday the end of the cease fire , with israel' - 145 

(Repeated 1 times) (hamas announced on thursday the end of the cease , fire with:: the 
cease fire with israel) 

19) 'hamas announced thursday the end of the cease fire with , israel was' - 145 

(Repeated 1 times) (hamas announced thursday the end of the cease , fire with 
israel: xease fire with israel was) 

20) 'hamas announced thursday the end of cease fire , by israel* - 145 (Repeated 4 
times) (hamas announced thursday the end of cease , fire by: xease fire by israel) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 253 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 129 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 82 (Score = 130 times) 

4) announced thursday the end of cease , fire with israel and - 68 (Score = 130 times) 

5) announced thursday the end of cease fire , by israel - 66 (Score = 125 times) 

6) announced thursday the end of cease fire , with israel - 66 (Score = 135 times) 

7) announced on thursday the end of cease , fire with israel - 51 (Score = 130 times) 

8) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

9) announced thursday the end of cease , fire with israel the - 50 (Score = 130 times) 

10) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

11) announced thursday the end of its unilateral cease , fire with israel - 43 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 35 (Score = 125 
times) 

13) announced thursday the end of the cease , fire by israel - 33 (Score = 120 times) 

14) announced thursday the end of the cease fire , with israel - 32 (Score = 130 times) 

15) e announced thursday the end of cease , fire with israel - 31 (Score = 115 times) 

16) announced thursday the end of the cease , fire israel - 30 (Score = 125 times) 

17) announced thursday the end of the cease , fire with israel and - 29 (Score = 125 
times) 

18) hamas announced thursday the end of cease , fire with israel - 27 (Score = 155 
times) 

19) announced on thursday the end of its unilateral cease , fire with israel - 26 (Score 
= 115 times) 

20) announced thursday the end of the cease , fire by israel with - 26 (Score = 125 
times) 

@@@ Pre 2 @@@ 
@@@ Post 2 @@@ 
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Trying to overlap 'hamas anuncio este jueves el fin de su cese f , 'su cese del fuego con 
israer (2,hamas anuncio este jueves el fin de su cese del fiiego con israel,7) — (631) 
Got an overlap in source, checking target 
1500-631 

Overlap check for f hamas anuncio este jueves el fin de su cese' , 'su cese del fuego con 
israel 1 took 2.783 

*** hamas anuncio este jueves el fin de su cese (1500), (631)su cese del fuego 
con israel = hamas anuncio este jueves el fin de su cese del fuego con israel 
@@@ 1575 ->0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 



1) 'hamas announced thursday the end of cease fire , with israel* - 155 (Repeated 1 
times) (null) 

2) 'hamas announced thursday the end of cease , fire with israel' - 155 (Repeated 27 
times) (null) 

3) 'hamas announced on thursday the end of cease fire , with israel' - 150 (Repeated 
1 times) (null) 

4) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(null) 

5) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 
22 times) (null) 

6) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (null) 

7) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (null) 

8) 'hamas announced thursday the end of cease , fire with israel and' - 150 (Repeated 
9 times) (null) 

9) 'hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 
9 times) (null) 

10) 'hamas announced thursday the end of cease fire with , israel was' - 150 

(Repeated 1 times) (null) 

11) 'hamas announced thursday the end of the cease , fire with israel' - 150 

(Repeated 23 times) (null) 

12) 'hamas announced thursday the end of the cease fire , with israel' - 150 

(Repeated 1 times) (null) 

13) 'hamas announced thursday the end of cease fire with , israel the' - 150 

(Repeated 3 times) (null) 

14) 'hamas announced thursday the end of cease , fire with israel the' - 150 

(Repeated 3 times) (null) 

15) 'hamas announced thursday the end of the cease fire with , israel the' - 145 

(Repeated 2 times) (null) 

16) 'hamas announced thursday the end of the cease fire , by israel with' - 145 

(Repeated 2 times) (null) 

17) 'hamas announced on thursday the end of cease fire with , israel was' - 145 

(Repeated 1 times) (null) 



340 



18) 'hamas announced on thursday the end of the cease fire , with israel' - 145 

(Repeated 1 times) (null) 

19) 'hamas announced thursday the end of the cease fire with , israel was 1 - 145 
(Repeated 1 times) (null) 

20) 'hamas announced thursday the end of cease fire , by israel' - 145 (Repeated 4 
times) (null) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 252 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 126 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 81 (Score = 130 times) 

4) announced thursday the end of cease , fire with israel and - 67 (Score = 130 times) 

5) announced thursday the end of cease fire , with israel - 66 (Score = 135 times) 

6) announced thursday the end of cease fire , by israel - 66 (Score = 125 times) 

7) announced on thursday the end of cease , fire with israel - 51 (Score = 130 times) 

8) announced thursday the end of cease , fire with israel the - 50 (Score = 130 times) 

9) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

10) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

1 1) announced thursday the end of its unilateral cease , fire with israel - 43 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 35 (Score = 125 
times) 

13) announced thursday the end of the cease , fire by israel - 33 (Score = 120 times) 

14) announced thursday the end of the cease fire , with israel - 32 (Score = 130 times) 

15) e announced thursday the end of cease , fire with israel - 31 (Score = 115 times) 

16) announced thursday the end of the cease , fire israel - 29 (Score = 125 times) 

17) hamas announced thursday the end of cease , fire with israel - 27 (Score = 155 
times) 

18) announced thursday the end of the cease , fire with israel and - 27 (Score = 125 
times) 

19) announced on thursday the end of its unilateral cease , fire with israel - 26 (Score 
= 115 times) 

20) announced thursday the end of the cease , fire by israel with - 26 (Score = 125 
times) 

@@@ Pre2@@@ 
@@@ Post 2 @@@ 

Trying to overlap 'hamas anuncio este jueves' , 'su cese del fuego con israel' (2,null,7) - 
-(631) 

No good source overlap 
{} 

x cese del fuego con was just translated and returned results 

Number of results = 1000 

Translation for cese del fuego con took 0.705 

going to try and overlap this piece with the hashmap 
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@@@ Pre 2 @@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de' , f cese del fuego con' (2,null,8) - 
-(1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin f , 'cese del fuego con 1 (2,null,8) — 
(1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el' , 'cese del fuego con 1 (2,null,8) - 
(1000) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con israel' , 

'cese del fuego con f (2,null,8) - (1000) 

No good source overlap 

@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , 'cese del 
fuego con' (2,hamas anuncio este jueves el fin de su cese del fuego con,8) — (1000) 
Got an overlap in source, checking target 
1500-1000 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego' , 'cese del fiiego 
con' took 9.486 

*** hamas anuncio este jueves el fm de su cese del fuego (1500), (1000)cese 
del fuego con - hamas anuncio este jueves el fin de su cese del fuego con 
@@@ 29730 -> 0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con 



1) 'hamas announced thursday the end of cease , fire with their 1 - 140 (Repeated 4 

times) (hamas announced thursday the end of, cease fire::cease fire with their) 

2) 'hamas announced thursday the end of cease , fire with' - 135 (Repeated 93 times) 
(hamas announced thursday the end of, cease fire::of cease fire with) 

3) f hamas announced on thursday the end of cease , fire with their' - 135 (Repeated 4 
times) (hamas announced on thursday the end of, cease fire::cease fire with their) 

4) 'hamas announced thursday the end of cease , fire his' - 135 (Repeated 3 times) 
(null) 

5) 'hamas announced thursday the end of cease , fire of - 135 (Repeated 10 times) 
(hamas announced thursday the end of, cease fire::cease fire of) 

6) 'announced thursday the end of cease , fire with hamas' - 135 (Repeated 141 times) 
(announced thursday the end of, cease fire::cease fire with hamas) 
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7) 'hamas announced thursday the end of the cease , fire with their' - 135 (Repeated 
4 times) (hamas announced thursday the end of the , cease fire::cease fire with their) 

8) 'hamas announced on thursday the end of cease , fire with' - 130 (Repeated 80 
times) (hamas announced on thursday the end of, cease fire::of cease fire with) 

9) 'announced thursday the end of cease , fire with hamas and' - 130 (Repeated 94 
times) (announced thursday the end of, cease fire::cease fire with hamas and) 

10) 'and announced thursday the end of cease , fire with hamas' - 130 (Repeated 6 
times) (and announced thursday the end of, cease fire::cease fire with hamas) 

1 1) 'hamas announced thursday the end of cease fire , in their' - 130 (Repeated 3 
times) (null) 

12) 'hamas announced thursday the end of cease , fire with in' - 130 (Repeated 6 
times) (hamas announced thursday the end of, cease fire::cease fire with in) 

13) 'announced thursday the end of the cease , fire with hamas' - 130 (Repeated 103 
times) (announced thursday the end of the , cease fire::cease fire with hamas) 

14) 'hamas announced thursday the end of the cease , fire with' - 130 (Repeated 80 
times) (hamas announced thursday the end of the , cease fire:: the cease fire with) 

15) 'hamas announced thursday the end of cease , fire on their' - 130 (Repeated 2 
times) (null) 

16) 'hamas announced thursday the end of cease fire , for their' - 130 (Repeated 2 
times) (null) 

17) 'hamas announced thursday the end of cease , fire with the' - 130 (Repeated 52 
times) (hamas announced thursday the end of, cease fire::of cease fire with the) 

18) 'hamas announced on thursday the end of the cease , fire with their' - 130 

(Repeated 4 times) (hamas announced on thursday the end of the , cease fire::cease fire 
with their) 

19) 'they announced thursday the end of cease , fire with hamas' - 130 (Repeated 3 
times) (they announced thursday the end of, cease fire::cease fire with hamas) 

20) 'were announced thursday the end of cease , fire with hamas' - 130 (Repeated 3 
times) (were announced thursday the end of, cease fire::cease fire with hamas) 



Sorted by repetition 



1) announced thursday the end of cease , fire with - 276 (Score = 115 times) 

2) announced thursday the end of the cease , fire with - 199 (Score = 110 times) 

3) announced thursday the end of cease , fire with hamas - 141 (Score = 135 times) 

4) announced on thursday the end of cease , fire with - 106 (Score = 110 times) 

5) announced thursday the end of the cease , fire with hamas - 103 (Score = 130 
times) 

6) announced thursday the end of cease , fire with hamas and - 94 (Score = 130 
times) 

7) hamas announced thursday the end of cease , fire with - 93 (Score = 135 times) 

8) hamas announced on thursday the end of cease , fire with - 80 (Score = 130 times) 

9) hamas announced thursday the end of the cease , fire with - 80 (Score = 130 times) 

10) announced thursday the end of cease , fire with the - 78 (Score = 110 times) 

1 1) announced on thursday the end of the cease , fire with - 58 (Score = 105 times) 

12) announced thursday the end of the cease , fire with hamas and - 56 (Score = 125 
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times) 

13) hamas announced thursday the end of cease , fire with the - 52 (Score = 130 
times) 

14) announced thursday the end of the cease , fire with the - 52 (Score = 105 times) 

15) announced on thursday the end of cease , fire with the - 49 (Score = 105 times) 

16) announced thursday the end of cease , fire with hamas and the - 47 (Score = 125 
times) 

17) hamas announced thursday the end of the cease , fire with the - 43 (Score = 125 
times) 

18) hamas announced on thursday the end of cease , fire with the - 43 (Score = 125 
times) 

19) hamas announced on thursday the end of the cease , fire with - 40 (Score = 125 
times) 

20) announced thursday the end of cease , fire a - 38 (Score = 105 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con 1 , 'cese del 

fuego con' (2,null,8) (1000) 

No good source overlap 

@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese' , 'cese del fuego con' 

(2,null,8) - (1000) 
No good source overlap 
@@@Pre2@@@ 
@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves 1 , 'cese del fuego con' (2,null,8) — (1000) 
No good source overlap 

{} 

x cese del fuego con israel was just translated and returned results 
Number of results = 748 

Translation for cese del fuego con israel took 0.888 
going to try and overlap this piece with the hashmap 
@@@Pre2@@@ 
@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de' , 'cese del fuego con israel' 

(2,null,8) -- (748) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin' , 'cese del fuego con israel' 

(2,null,8) -- (748) 
No good source overlap 
@@@ Pre2@@@ 
@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el' , 'cese del fuego con israel' (2,null,8) - 
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- (748) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con israel 1 , 
'cese del fuego con israel* (2,null,8) — (748) 
No good source overlap 
@@@ Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , f cese del 
fuego con israel (2,hamas anuncio este jueves el fin de su cese del fuego con israel,8) — 
(748) 

Got an overlap in source, checking target 
1500-748 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego 1 , 'cese del fuego 
con israel 1 took 7.89 

*** hamas anuncio este jueves el fin de su cese del fuego (1500), (748)cese del 
fuego con israel = hamas anuncio este jueves el fin de su cese del fuego con 
israel 

@@@ 18681 ->0 



Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 



1) 'hamas announced thursday the end of cease , fire with israel 1 - 155 (Repeated 28 
times) (hamas announced thursday the end of, cease fire::cease fire with israel) 

2) f hamas announced thursday the end of cease fire , with israel' - 155 (Repeated 1 
times) (null) 

3) 'hamas announced on thursday the end of cease fire , with israel 1 - 150 (Repeated 
1 times) (null) 

4) 'hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 
9 times) (null) 

5) 'hamas announced thursday the end of the cease , fire with israel' - 150 (Repeated 
24 times) (hamas announced thursday the end of the , cease fire::the cease fire with 
israel) 

6) 'hamas announced thursday the end of cease , fire with israel the' - 150 (Repeated 
3 times) (hamas announced thursday the end of, cease fire::cease fire with israel the) 

7) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(hamas announced thursday the end of, cease fire::cease fire israel) 

8) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 
23 times) (hamas announced on thursday the end of, cease fire::cease fire with israel) 

9) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (hamas announced thursday the end of, cease fire::cease fire with israel was) 

10) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire by:: cease fire by israel with) 

1 1) 'hamas announced thursday the end of cease , fire with israel and' - 150 
(Repeated 9 times) (hamas announced thursday the end of, cease fire::cease fire with 
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israel and) 

12) f hamas announced thursday the end of cease fire with , israel was' - 150 

(Repeated 1 times) (null) 

13) 'hamas announced thursday the end of the cease fire , with israel* - 150 

(Repeated 1 times) (null) 

14) 'hamas announced thursday the end of cease fire with , israel the 1 - 150 

(Repeated 3 times) (null) 

15) 'hamas announced thursday the end of the cease fire with , israel the' - 145 

(Repeated 2 times) (null) 

16) f hamas announced on thursday the end of cease fire with , israel was' - 145 
(Repeated 1 times) (null) 

17) 'hamas announced on thursday the end of the cease fire , with israel' - 145 

(Repeated 1 times) (null) 

18) 'hamas announced thursday the end of the cease fire with , israel was' - 145 

(Repeated 1 times) (null) 

19) 'hamas announced thursday the end of the cease , fire with israel the' - 145 

(Repeated 3 times) (hamas announced thursday the end of the , cease fire::cease fire with 
israel the) 

20) 'hamas announced on thursday the end of cease fire with , israel and' - 145 

(Repeated 8 times) (null) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 259 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 122 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 71 (Score = 130 times) 

4) announced thursday the end of cease , fire with israel and - 67 (Score = 130 times) 

5) announced thursday the end of cease fire , by israel - 62 (Score = 125 times) 

6) announced thursday the end of cease fire , with israel - 61 (Score = 135 times) 

7) announced on thursday the end of cease , fire with israel - 51 (Score = 130 times) 

8) announced thursday the end of cease , fire with israel the - 51 (Score = 130 times) 

9) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

10) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

11) announced thursday the end of its unilateral cease , fire with israel - 44 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 37 (Score = 125 
times) 

13) e announced thursday the end of cease , fire with israel - 34 (Score = 115 times) 

14) announced thursday the end of the cease , fire israel - 32 (Score = 125 times) 

15) announced thursday the end of the cease fire , with israel - 30 (Score = 130 times) 

16) hamas announced thursday the end of cease , fire with israel - 28 (Score = 155 
times) 

17) announced on thursday the end of its unilateral cease , fire with israel - 26 (Score 
= 115 times) 

18) hamas announced thursday the end of the cease , fire with israel - 24 (Score = 
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150 times) 

19) announced thursday the end of cease fire , and israel - 23 (Score = 125 times) 

20) announced thursday the end of the cease , fire with israel and - 23 (Score = 125 
times) 

@@@ Pre 2 @@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con' , f cese del 
fuego con israel (2,hamas anuncio este jueves el fin de su cese del fuego con israel,8) — 
(748) 

Got an overlap in source, checking target 
1500-748 

Overlap check for f hamas anuncio este jueves el fin de su cese del fuego con' , 'cese del 
fuego con israel 1 took 3.299 

*** hamas anuncio este jueves el fin de su cese del fuego con (1500), (748)cese 
del fuego con israel = hamas anuncio este jueves el fin de su cese del fuego con 
israel 

@@@ 2840 -> 0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 



1) 'hamas announced thursday the end of cease , fire with israel 1 - 155 (Repeated 28 
times) (null) 

2) f hamas announced thursday the end of cease fire , with israel 1 - 155 (Repeated 1 
times) (hamas announced thursday the end of cease , fire with::cease fire with israel) 

3) 'hamas announced on thursday the end of cease fire , with israel' - 150 (Repeated 

1 times) (hamas announced on thursday the end of cease , fire with::cease fire with israel) 

4) 'hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 
9 times) (hamas announced thursday the end of cease , fire with israel: :cease fire with 
israel and) 

5) 'hamas announced thursday the end of the cease , fire with israel' - 150 (Repeated 
24 times) (null) 

6) 'hamas announced thursday the end of cease , fire with israel the' - 150 (Repeated 
3 times) (null) 

7) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(null) 

8) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 
23 times) (null) 

9) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (null) 

10) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire byrxease fire by israel with) 

1 1) 'hamas announced thursday the end of cease fire with > israel as' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire with israel: ifire with israel as) 

12) 'hamas announced thursday the end of cease , fire with israel and' - 150 
(Repeated 9 times) (null) 

13) 'hamas announced thursday the end of cease fire with , israel was' - 150 
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(Repeated 1 times) (hamas announced thursday the end of cease , fire with israel: xease 
fire with israel was) 

14) 1 hamas announced thursday the end of the cease fire , with israel 9 - 150 

(Repeated 1 times) (hamas announced thursday the end of the cease , fire with:: the cease 
fire with israel) 

15) 'hamas announced thursday the end of cease fire with , israel the 9 - 150 

(Repeated 3 times) (hamas announced thursday the end of cease , fire with israel: xease 
fire with israel the) 

16) 'hamas announced thursday the end of the cease fire with , israel the 9 - 145 

(Repeated 2 times) (hamas announced thursday the end of the cease , fire with 
israel: xease fire with israel the) 

17) 'hamas announced on thursday the end of cease fire with , israel was 9 - 145 

(Repeated 1 times) (hamas announced on thursday the end of cease , fire with 
israel: xease fire with israel was) 

18) 'hamas announced on thursday the end of the cease fire , with israel 9 - 145 

(Repeated 1 times) (hamas announced on thursday the end of the cease , fire with:: the 
cease fire with israel) 

19) 'hamas announced thursday the end of the cease fire with , israel was 9 - 145 

(Repeated 1 times) (hamas announced thursday the end of the cease , fire with 
israel: xease fire with israel was) 

20) 'hamas announced on thursday the end of cease fire with , israel as 9 - 145 

(Repeated 3 times) (hamas announced on thursday the end of cease , fire with israel: :fire 
with israel as) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 250 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 101 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 65 (Score = 130 times) 

4) announced thursday the end of cease fire , with israel - 64 (Score = 135 times) 

5) announced thursday the end of cease , fire with israel and - 60 (Score = 130 times) 

6) announced thursday the end of cease fire , by israel - 58 (Score = 125 times) 

7) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

8) announced thursday the end of cease , fire with israel the - 50 (Score = 130 times) 

9) announced on thursday the end of cease , fire with israel - 47 (Score = 130 times) 

10) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

11) announced thursday the end of its unilateral cease , fire with israel - 44 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 37 (Score = 125 
times) 

13) e announced thursday the end of cease , fire with israel - 31 (Score = 115 times) 

14) announced thursday the end of the cease fire , with israel - 31 (Score = 130 times) 

15) hamas announced thursday the end of cease , fire with israel - 28 (Score = 155 
times) 

16) hamas announced thursday the end of the cease , fire with israel - 24 (Score = 
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150 times) 

17) announced thursday the end of its unilateral cease fire , with israel - 24 (Score = 
120 times) 

18) hamas announced on thursday the end of cease , fire with israel - 23 (Score = 150 
times) 

19) announced on thursday the end of its unilateral cease , fire with israel - 23 (Score 
= 115 times) 

20) announced thursday the end of the cease , fire israel - 22 (Score = 125 times) 
@@@ Pre 2 @@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese' , f cese del fuego con 
israer (2,null,8) - (748) 
No good source overlap 
@@@ Pre2@@@ 
@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves 1 , ! cese del fuego con israel' (2,null,8) - 
(748) 

No good source overlap 
{} 

x del fuego con israel was just translated and returned results 

Number of results = 604 

Translation for del fuego con israel took 0.634 

going to try and overlap this piece with the hashmap 

@@@ Pre 2 @@@ 

@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de' , 'del fuego con israel' (2,null,9) 
- (604) 

No good source overlap 
@@@ Pre 2 @@@ 
@@@ Post 2 @@@ 

Trying to overlap 'hamas anuncio este jueves el fin' , 'del fuego con israel' (2,null,9) ~ 
(604) 

No good source overlap 
@@@Pre2@@@ 
@@@ Post 2 @@@ 

Trying to overlap 'hamas anuncio este jueves el' , 'del fuego con israel' (2,null,9) — 
(604) 

No good source overlap 
@@@ Pre 2 @@@ 
@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con israel' , 

'del fuego con israel' (2,null,9) -- (604) 

No good source overlap 

@@@Pre2@@@ 

@@@Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego' , 'del fuego 
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con israel' (2,hamas anuncio este jueves el fin de su cese del fiiego con israel,9) - (604) 

Got an overlap in source, checking target 

1500-604 

Overlap check for 'hamas anuncio este jueves el fin de su cese del fuego' , 'del fuego con 
israel 1 took 3.242 

*** hamas anuncio este jueves el fin de su cese del fiiego (1500), (604)del 
fuego con israel = hamas anuncio este jueves el fin de su cese del fuego con 
israel 

@@@ 2927 -> 0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 

1) 'hamas announced thursday the end of cease , fire with israel 1 - 155 (Repeated 28 
times) (hamas announced thursday the end of, cease fire::cease fire with israel) 

2) f hamas announced thursday the end of cease fire , with israel* - 155 (Repeated 1 
times) (null) 

3) 'hamas announced on thursday the end of cease fire , with israel 9 - 150 (Repeated 
1 times) (null) 

4) f hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 
9 times) (null) 

5) 'hamas announced thursday the end of the cease , fire with israel* - 150 (Repeated 
24 times) (hamas announced thursday the end of the , cease fire:: the cease fire with 
israel) 

6) 'hamas announced thursday the end of cease , fire with israel the' - 150 (Repeated 
3 times) (hamas announced thursday the end of, cease fire::cease fire with israel the) 

7) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(null) 

8) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 

23 times) (hamas announced on thursday the end of, cease fire::cease fire with israel) 

9) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (null) 

10) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire by::cease fire by israel with) 

1 1) 'hamas announced thursday the end of cease fire with , israel as' - 150 (Repeated 
3 times) (null) 

12) 'hamas announced thursday the end of cease , fire with israel and' - 150 

(Repeated 9 times) (hamas announced thursday the end of, cease fire::cease fire with 
israel and) 

13) 'hamas announced thursday the end of cease fire with , israel was' - 150 
(Repeated 1 times) (null) 

14) 'hamas announced thursday the end of the cease fire , with israel' - 150 

(Repeated 1 times) (null) 

15) 'hamas announced thursday the end of cease fire with , israel the' - 150 

(Repeated 3 times) (null) 

16) 'hamas announced thursday the end of the cease fire with , israel the' - 145 

(Repeated 2 times) (null) 
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17) 'hamas announced on thursday the end of cease fire with , israel was' - 145 
(Repeated 1 times) (null) 

18) 'hamas announced on thursday the end of the cease fire , with israel 1 - 145 

(Repeated 1 times) (null) 

19) 'hamas announced thursday the end of the cease fire with , israel was' - 145 
(Repeated 1 times) (null) 

20) 'hamas announced on thursday the end of cease fire with , israel as' - 145 

(Repeated 3 times) (null) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 250 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 101 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 65 (Score = 130 times) 

4) announced thursday the end of cease fire , with israel - 64 (Score = 135 times) 

5) announced thursday the end of cease , fire with israel and - 60 (Score = 130 times) 

6) announced thursday the end of cease fire , by israel - 58 (Score = 125 times) 

7) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

8) announced thursday the end of cease , fire with israel the - 50 (Score = 130 times) 

9) announced on thursday the end of cease , fire with israel - 47 (Score = 130 times) 

10) announced thursday the end of cease , fire with israel was - 47 (Score = 130 
times) 

11) announced thursday the end of its unilateral cease , fire with israel - 44 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 37 (Score = 125 
times) 

13) e announced thursday the end of cease , fire with israel - 31 (Score = 115 times) 

14) announced thursday the end of the cease fire , with israel - 31 (Score = 130 times) 

15) hamas announced thursday the end of cease , fire with israel - 28 (Score = 155 
times) 

16) hamas announced thursday the end of the cease , fire with israel - 24 (Score = 
150 times) 

17) announced thursday the end of its unilateral cease fire , with israel - 24 (Score = 
120 times) 

18) hamas announced on thursday the end of cease , fire with israel - 23 (Score = 150 
times) 

19) announced on thursday the end of its unilateral cease , fire with israel - 23 (Score 
= 115 times) 

20) announced thursday the end of the cease , fire israel - 22 (Score = 125 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese del fuego con' , 'del 
fuego con israel (2,hamas anuncio este jueves el fin de su cese del fuego con israel,9) — 
(604) 

Got an overlap in source, checking target 
1500-604 
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Overlap check for f hamas anuncio este jueves el fin de su cese del fuego con 1 , 'del fuego 
con israel 1 took 2.82 

*** hamas anuncio este jueves el fin de su cese del fuego con (1500), (604)del 
fuego con israel = hamas anuncio este jueves el fin de su cese del fuego con 
israel 

@@@ 1577 ->0 

Overlappp results for hamas anuncio este jueves el fin de su cese del fuego con israel 



1) 'hamas announced thursday the end of cease , fire with israel 1 - 155 (Repeated 28 
times) (null) 

2) 'hamas announced thursday the end of cease fire , with israel 9 - 155 (Repeated 1 
times) (hamas announced thursday the end of cease , fire with::cease fire with israel) 

3) 'hamas announced on thursday the end of cease fire , with israel 9 - 150 (Repeated 

1 times) (hamas announced on thursday the end of cease , fire with::cease fire with israel) 

4) 'hamas announced thursday the end of cease fire with , israel and' - 150 (Repeated 
9 times) (hamas announced thursday the end of cease , fire with israel: :cease fire with 
israel and) 

5) 'hamas announced thursday the end of the cease , fire with israel' - 150 (Repeated 
24 times) (null) 

6) 'hamas announced thursday the end of cease , Are with israel the' - 150 (Repeated 
3 times) (null) 

7) 'hamas announced thursday the end of cease , fire israel' - 150 (Repeated 8 times) 
(null) 

8) 'hamas announced on thursday the end of cease , fire with israel' - 150 (Repeated 
23 times) (null) 

9) 'hamas announced thursday the end of cease , fire with israel was' - 150 (Repeated 
1 times) (null) 

10) 'hamas announced thursday the end of cease fire , by israel with' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire by::cease fire by israel with) 

1 1) 'hamas announced thursday the end of cease fire with , israel as' - 150 (Repeated 
3 times) (hamas announced thursday the end of cease , fire with israel: :fire with israel as) 

12) 'hamas announced thursday the end of cease , fire with israel and' - 150 
(Repeated 9 times) (null) 

13) 'hamas announced thursday the end of cease fire with , israel was' - 150 

(Repeated 1 times) (null) 

14) 'hamas announced thursday the end of the cease fire , with israel' - 150 

(Repeated 1 times) (hamas announced thursday the end of the cease , fire with::the cease 
fire with israel) 

15) 'hamas announced thursday the end of cease fire with , israel the' - 150 

(Repeated 3 times) (hamas announced thursday the end of cease , fire with israel ::cease 
fire with israel the) 

16) 'hamas announced thursday the end of the cease fire with , israel the' - 145 

(Repeated 2 times) (hamas announced thursday the end of the cease , fire with 
israel: xease fire with israel the) 

17) 'hamas announced on thursday the end of cease fire with , israel was' - 145 
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(Repeated 1 times) (null) 

18) 'hamas announced on thursday the end of the cease fire , with israel 9 - 145 

(Repeated 1 times) (hamas announced on thursday the end of the cease , fire with:: the 
cease fire with israel) 

19) f hamas announced thursday the end of the cease fire with , israel was 9 - 145 

(Repeated 1 times) (null) 

20) 'hamas announced on thursday the end of cease fire with , israel as* - 145 

(Repeated 3 times) (hamas announced on thursday the end of cease , fire with israel: :fire 
with israel as) 



Sorted by repetition 



1) announced thursday the end of cease , fire with israel - 249 (Score = 135 times) 

2) announced thursday the end of the cease , fire with israel - 99 (Score = 130 times) 

3) announced thursday the end of cease , fire israel - 65 (Score = 130 times) 

4) announced thursday the end of cease fire , with israel - 64 (Score = 135 times) 

5) announced thursday the end of cease , fire with israel and - 59 (Score = 130 times) 

6) announced thursday the end of cease fire , by israel - 58 (Score = 125 times) 

7) announced thursday the end of cease , fire with israel the - 50 (Score = 130 times) 

8) announced thursday the end of cease fire , by israel with - 50 (Score = 130 times) 

9) announced thursday the end of cease , fire with israel was - 47 (Score = 130 times) 

10) announced on thursday the end of cease , fire with israel - 47 (Score = 130 times) 

1 1) announced thursday the end of its unilateral cease , fire with israel - 44 (Score = 
120 times) 

12) announced on thursday the end of the cease , fire with israel - 37 (Score = 125 
times) 

13) announced thursday the end of the cease fire , with israel - 31 (Score = 130 times) 

14) e announced thursday the end of cease , fire with israel - 30 (Score = 115 times) 

15) hamas announced thursday the end of cease , fire with israel - 28 (Score = 155 
times) 

16) hamas announced thursday the end of the cease , fire with israel - 24 (Score = 
150 times) 

17) announced thursday the end of its unilateral cease fire , with israel - 24 (Score = 
120 times) 

18) hamas announced on thursday the end of cease , fire with israel - 23 (Score = 150 
times) 

19) announced on thursday the end of its unilateral cease , fire with israel - 23 (Score 
= 115 times) 

20) announced thursday the end of the cease , fire israel - 22 (Score = 125 times) 
@@@Pre2@@@ 

@@@ Post2@@@ 

Trying to overlap 'hamas anuncio este jueves el fin de su cese' , f del fuego con israel 1 

(2,null,9) (604) 
No good source overlap 
@@@Pre2@@@ 
@@@ Post2@@@ 
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Trying to overlap 'hamas anuncio este jueves 1 , 'del fuego con israel' (2,null,9) — (604) 
No good source overlap 

Final results for hamas anuncio este jueves el fin de su cese del 
fuego con israel (Ran 22 Overlap checks) 

1 ) hamas announced thursday the end of cease fire with israel - 155 (Repeated 35 
times) 

2) hamas announced thursday the end of cease fire by israel with - 150 (Repeated 3 
times) 

3) hamas announced thursday the end of cease fire with israel as - 150 (Repeated 3 
times) 

4) hamas announced thursday the end of cease fire israel - 150 (Repeated 8 times) 

5) hamas announced thursday the end of cease fire with israel and - 150 (Repeated 
9 times) 

6) hamas announced thursday the end of cease fire with israel was - 150 (Repeated 
1 times) 

7) hamas announced on thursday the end of cease fire with israel - 150 (Repeated 
29 times) 

8) hamas announced thursday the end of the cease fire with israel - 150 (Repeated 
28 times) 

9) hamas announced thursday the end of cease fire with israel the - 150 (Repeated 
3 times) 

10) hamas announced thursday the end of cease fire by israel - 145 (Repeated 4 
times) 

1 1 ) hamas announced on thursday the end of cease fire with israel was - 145 

(Repeated 1 times) 

12) hamas announced thursday the end of the cease fire by israel with - 145 

(Repeated 2 times) 

13) hamas announced on thursday the end of the cease fire with israel - 145 

(Repeated 25 times) 

14) hamas announced on thursday the end of cease fire israel - 145 (Repeated 7 
times) 

15) hamas announced thursday the end of cease fire israel is - 145 (Repeated 3 
times) 

16) hamas announced thursday the end of the cease fire israel - 145 (Repeated 7 
times) 

17) hamas announced on thursday the end of cease fire with israel and - 145 

(Repeated 8 times) 

18) hamas announced on thursday the end of cease fire with israel the - 145 

(Repeated 2 times) 

19) hamas announced thursday the end of the cease fire with israel the - 145 

(Repeated 2 times) 

20) hamas announced thursday the end of the cease fire with israel was - 145 

(Repeated 1 times) 

21 ) hamas announced on thursday the end of cease fire by israel with - 145 

(Repeated 2 times) 

22) hamas announced thursday the end of cease fire and israel - 145 (Repeated 4 
times) 

23) hamas announced thursday the end of the cease fire with israel as - 145 

[(Repeated 3 times) 
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24) hamas announced on thursday the end of cease fire with israel as - 145 

(Repeated 3 times) 

25) hamas announced thursday the end of the cease fire with israel and - 145 

(Repeated 8 times) 

26) hamas announced on thursday the end of cease fire israel is - 140 (Repeated 3 
times) 

27) hamas announced thursday the end of cease fire and on israel - 140 (Repeated 
4 times) 

28) hamas announced on thursday the end of the cease fire israel - 140 (Repeated 7 
times) 

29) hamas announced thursday the end of the cease fire by israel - 140 (Repeated 3 
times) 

30) hamas announced thursday the end of the cease fire and israel - 140 (Repeated 
3 times) 

31) hamas announced on thursday the end of cease fire and israel - 140 (Repeated 
3 times) 

32) hamas announced on thursday the end of the cease fire with israel and - 140 

(Repeated 8 times) 

33) hamas announced on thursday the end of the cease fire with israel the - 140 

(Repeated 2 times) 

34) hamas announced thursday the end of the cease fire israel is - 140 (Repeated 3 
times) 

35) hamas announced on thursday the end of the cease fire by israel with - 140 

(Repeated 2 times) 

36) hamas announced on thursday the end of cease fire by israel - 140 (Repeated 3 
times) 

37) hamas announced on thursday the end of the cease fire with israel was - 140 

(Repeated 1 times) 

38) hamas announced on thursday the end of the cease fire with israel as - 140 

(Repeated 3 times) 

39) hamas announced thursday the end of its unilateral cease fire with israel - 140 

(Repeated 20 times) 

40) hamas announced thursday the end of its unilateral cease fire with israel and - 

135 (Repeated 8 times) 

41) hamas announced thursday the end of its unilateral cease fire with israel was - 

135 (Repeated 1 times) 

42) hamas announced on thursday the end of its unilateral cease fire with israel - 

1 35 (Repeated 1 6 times) 

43) hamas announced thursday the end of the cease fire and on israel - 135 

(Repeated 4 times) 

44) hamas announced thursday the end of cease fire hudna with israel - 135 

(Repeated 3 times) 

45) hamas announced on thursday the end of the cease fire and israel - 135 

(Repeated 3 times) 

46) hamas announced thursday the end of cease fire and on israel to - 135 

(Repeated 3 times) 

47) hamas announced thursday the end of cease fire against israel with - 135 

(Repeated 2 times) 

48) announced thursday the end of cease fire with israel - 135 (Repeated 235 times) 

49) hamas announced on thursday the end of the cease fire by israel - 135 
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(Repeated 3 times) 

50) hamas announced thursday the end of cease fire with israel defense - 135 

(Repeated 2 times) 

51) hamas announced on thursday the end of the cease fire israel is - 135 

(Repeated 3 times) 

52) hamas announced thursday the end of cease fire with israel since - 135 

(Repeated 1 times) 

53) hamas announced on thursday the end of cease fire and on israel - 135 
(Repeated 3 times) 

54) hamas announced thursday the 
(Repeated 3 times) 

55) hamas announced thursday the 
135 (Repeated 2 times) 

56) hamas announced thursday the 
(Repeated 7 times) 

57) hamas announced thursday the 
(Repeated 4 times) 

58) hamas announced thursday the 
3 times) 

59) hamas announced thursday the 
(Repeated 3 times) 

60) hamas announced thursday the 
135 (Repeated 2 times) 

61 ) hamas announced thursday the 
(Repeated 2 times) 

62) hamas announced thursday the 
(Repeated 3 times) 

63) announced thursday the end of cease fire with israel the - 130 (Repeated 45 
times) 

64) hamas announced on thursday the end of cease fire and on israel to - 130 

(Repeated 2 times) 

65) hamas announced thursday the end of the cease fire against israel with - 130 

(Repeated 1 times) 

66) hamas announced thursday the end of cease fire then israel - 130 (Repeated 2 
times) 

67) hamas announced on thursday the end of its unilateral cease fire by israel with 

- 130 (Repeated 2 times) 

68) hamas announced thursday the end of the cease fire with israel since - 130 

(Repeated 1 times) 

69) hamas announced on thursday the end of cease fire with israel since - 130 

(Repeated 1 times) 

70) hamas announced on thursday the end of cease fire hudna with israel - 130 

(Repeated 3 times) 

71 ) hamas announced on thursday the end of cease fire with israel renew - 130 

(Repeated 2 times) 

72) announced thursday the end of the cease fire with israel - 130 (Repeated 91 
times) 

73) hamas announced thursday the end of cease fire declaration israel - 130 

(Repeated 2 times) 

74) hamas announced on thursday the end of cease fire with israel but - 130 



end of cease fire with israel renew - 135 
end of its unilateral cease fire with israel the - 
end of its unilateral cease fire israel - 135 
end of cease fire with israel when - 135 
end of cease fire with israel but - 1 35 (Repeated 
end of cease fire terms with israel - 135 
end of its unilateral cease fire by israel with - 
end of cease fire with israel defence - 135 
end of cease fire with israel even - 1 35 
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(Repeated 3 times) 

75) announced thursday the end of cease fire with israel and - 130 (Repeated 54 
times) 

76) hamas announced thursday the end of cease fire with israel when in - 130 

(Repeated 3 times) 

77) hamas announced thursday the end of cease fire with israel and pretty - 130 

(Repeated 2 times) 

78) hamas announced thursday the end of its unilateral cease fire by israel - 130 

(Repeated 3 times) 

79) announced on thursday the end of cease fire with israel - 130 (Repeated 50 
times) 

80) hamas announced on thursday the end of cease fire terms with israel - 130 

(Repeated 2 times) 

81) hamas announced thursday the end of cease fire between israel - 130 

(Repeated 3 times) 

82) hamas announced thursday the end of the cease fire with israel but - 130 

(Repeated 3 times) 

83) hamas announced thursday the end of the cease fire with israel defence - 130 

(Repeated 1 times) 

84) hamas announced on thursday the end of its unilateral cease fire israel - 130 

(Repeated 4 times) 

85) hamas announced on thursday the end of its unilateral cease fire with israel 

was - 130 (Repeated 1 times) 

86) hamas announced thursday the end of cease fire agreement israel - 130 

(Repeated 2 times) 

87) hamas announced thursday the end of cease fire israel should - 130 (Repeated 
2 times) 

88) hamas announced on thursday the end of cease fire against israel with - 130 

(Repeated 2 times) 

89) hamas announced thursday the end of cease fire israel conquered - 130 

(Repeated 2 times) 

90) hamas announced thursday the end of its unilateral cease fire israel is - 130 

(Repeated 3 times) 

91) hamas announced thursday the end of the cease fire and on israel to - 130 

(Repeated 3 times) 

92) hamas announced thursday the end of cease fire by israel with continued - 130 

(Repeated 2 times) 

93) hamas announced on thursday the end of cease fire with israel when - 130 

(Repeated 3 times) 

94) hamas announced on thursday the end of cease fire with israel defense - 130 

(Repeated 2 times) 

95) hamas announced on thursday the end of cease fire with israel even - 130 

(Repeated 1 times) 

96) hamas announced thursday the end of the cease fire with israel even - 130 

(Repeated 2 times) 

97) hamas announced thursday the end of the cease fire with israel renew - 130 

(Repeated 2 times) 

98) and announced thursday the end of cease fire with israel - 130 (Repeated 12 
times) 

99) hamas announced thursday the end of the cease fire terms with israel - 130 
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(Repeated 2 times) 

100) announced thursday the end of cease fire israel - 130 (Repeated 55 times) 
Time so far took 101.26 (0) 
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